{"cells":[{"cell_type":"markdown","metadata":{"trusted":true,"jupyter":{},"tags":[],"slideshow":{"slide_type":"slide"},"id":"D89B0EAB542642F98900CCC4D4E49360","runtime":{"status":"default","execution_status":null,"is_visible":false},"scrolled":false,"notebookId":"66cecedfc175d1eb8493abb3"},"source":"# baseline_臭氧小时浓度预测  \n近地面的臭氧是蓝天白云下的隐形杀手，高浓度的臭氧对人体健康有很大危害。近年来，在全球变暖和城市化背景下，夏季极端高温频发，伴随着人为源排放的增加，为臭氧污染提供了有利的前提物和发生条件。臭氧污染存在**非线性化学响应关系**，其形成与其前体挥发性有机化合物(VOCs)和氮氧化物(NOx)的总量和比例密切相关，也可与颗粒物等其他污染物相互作用；臭氧污染具有明显的**区域性特征**，对气象因素极其敏感，受到局地的**温度、相对湿度、风向、风速**等气象条件影响较大。本赛题需要建立**基于气象要素和污染物浓度**观测资料的逐小时臭氧污染预测模型。  \n\n\n![Image Name](https://cdn.kesci.com/upload/image/r5heuyucay.png)  \n\n## 导入模块  \n下载pycaret模块，输出会有点逆天的长，是正常的。。。"},{"cell_type":"code","metadata":{"id":"549F4C20CBFB4CFBA8FF356DB75A96D9","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"!pip install pycaret -i https://pypi.tuna.tsinghua.edu.cn/simple","outputs":[{"output_type":"stream","name":"stdout","text":"Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple\nCollecting pycaret\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/33/f7/204fb66dbbf83c2448f2d345c797ca170c2ac9d74b902d09ddbce637fbbd/pycaret-2.3.10-py3-none-any.whl (320 kB)\n     |████████████████████████████████| 320 kB 1.0 MB/s            \n\u001b[?25hRequirement already satisfied: joblib in /opt/conda/lib/python3.6/site-packages (from pycaret) (0.13.2)\nCollecting mlflow\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/f2/2f/ca661cd6ff93f11143566c3732ce0cccd1c9638ceaac9d6cf3e01460c3dc/mlflow-1.23.1-py3-none-any.whl (15.6 MB)\n     |████████████████████████████████| 15.6 MB 2.1 MB/s            \n\u001b[?25hCollecting lightgbm>=2.3.1\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/74/d1/2e4b02e4611ab36647639c4eea8c4520bb90f948563e00a3bec583a9f9f5/lightgbm-4.3.0.tar.gz (1.7 MB)\n     |████████████████████████████████| 1.7 MB 1.4 MB/s            \n\u001b[?25h  Installing build dependencies ... \u001b[?25lerror\n\u001b[31m  ERROR: Command errored out with exit status 1:\n   command: /opt/conda/bin/python /opt/conda/lib/python3.6/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-h_x8j9q0/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.tuna.tsinghua.edu.cn/simple -- 'scikit-build-core>=0.4.4'\n       cwd: None\n  Complete output (12 lines):\n  Traceback (most recent call last):\n    File \"/opt/conda/lib/python3.6/runpy.py\", line 193, in _run_module_as_main\n      \"__main__\", mod_spec)\n    File \"/opt/conda/lib/python3.6/runpy.py\", line 85, in _run_code\n      exec(code, run_globals)\n    File \"/opt/conda/lib/python3.6/site-packages/pip/__main__.py\", line 27, in <module>\n      \"ignore\", category=DeprecationWarning, module=\".*packaging\\\\.version\"\n    File \"/opt/conda/lib/python3.6/warnings.py\", line 131, in filterwarnings\n      import re\n    File \"/opt/conda/lib/python3.6/re.py\", line 142, in <module>\n      class RegexFlag(enum.IntFlag):\n  AttributeError: module 'enum' has no attribute 'IntFlag'\n  ----------------------------------------\u001b[0m\n\u001b[33mWARNING: Discarding https://pypi.tuna.tsinghua.edu.cn/packages/74/d1/2e4b02e4611ab36647639c4eea8c4520bb90f948563e00a3bec583a9f9f5/lightgbm-4.3.0.tar.gz#sha256=006f5784a9bcee43e5a7e943dc4f02de1ba2ee7a7af1ee5f190d383f3b6c9ebe (from https://pypi.tuna.tsinghua.edu.cn/simple/lightgbm/) (requires-python:>=3.6). Command errored out with exit status 1: /opt/conda/bin/python /opt/conda/lib/python3.6/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-h_x8j9q0/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.tuna.tsinghua.edu.cn/simple -- 'scikit-build-core>=0.4.4' Check the logs for full command output.\u001b[0m\n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/92/3d/209b257b7bcfaa39d54b9ab22db5097ff110c6a2f2399d5244c76c43aba2/lightgbm-4.2.0.tar.gz (1.7 MB)\n     |████████████████████████████████| 1.7 MB 109.3 MB/s            \n\u001b[?25h  Installing build dependencies ... \u001b[?25lerror\n\u001b[31m  ERROR: Command errored out with exit status 1:\n   command: /opt/conda/bin/python /opt/conda/lib/python3.6/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-nmwp8d9z/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.tuna.tsinghua.edu.cn/simple -- 'scikit-build-core>=0.4.4'\n       cwd: None\n  Complete output (12 lines):\n  Traceback (most recent call last):\n    File \"/opt/conda/lib/python3.6/runpy.py\", line 193, in _run_module_as_main\n      \"__main__\", mod_spec)\n    File \"/opt/conda/lib/python3.6/runpy.py\", line 85, in _run_code\n      exec(code, run_globals)\n    File \"/opt/conda/lib/python3.6/site-packages/pip/__main__.py\", line 27, in <module>\n      \"ignore\", category=DeprecationWarning, module=\".*packaging\\\\.version\"\n    File \"/opt/conda/lib/python3.6/warnings.py\", line 131, in filterwarnings\n      import re\n    File \"/opt/conda/lib/python3.6/re.py\", line 142, in <module>\n      class RegexFlag(enum.IntFlag):\n  AttributeError: module 'enum' has no attribute 'IntFlag'\n  ----------------------------------------\u001b[0m\n\u001b[33mWARNING: Discarding https://pypi.tuna.tsinghua.edu.cn/packages/92/3d/209b257b7bcfaa39d54b9ab22db5097ff110c6a2f2399d5244c76c43aba2/lightgbm-4.2.0.tar.gz#sha256=8a4d051df2ab2218998a16f7712e843ee9e96d8b09ffbfcc18533da127e0da02 (from https://pypi.tuna.tsinghua.edu.cn/simple/lightgbm/) (requires-python:>=3.6). Command errored out with exit status 1: /opt/conda/bin/python /opt/conda/lib/python3.6/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-nmwp8d9z/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.tuna.tsinghua.edu.cn/simple -- 'scikit-build-core>=0.4.4' Check the logs for full command output.\u001b[0m\n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/98/a9/01f50aee85949ba713733b69a3f0b42d39719a414a0e29bdf2a9f05ecc53/lightgbm-4.1.0.tar.gz (1.7 MB)\n     |████████████████████████████████| 1.7 MB 1.4 MB/s            \n\u001b[?25h  Installing build dependencies ... \u001b[?25lerror\n\u001b[31m  ERROR: Command errored out with exit status 1:\n   command: /opt/conda/bin/python /opt/conda/lib/python3.6/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-4hb5r0ot/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.tuna.tsinghua.edu.cn/simple -- 'scikit-build-core>=0.4.4'\n       cwd: None\n  Complete output (12 lines):\n  Traceback (most recent call last):\n    File \"/opt/conda/lib/python3.6/runpy.py\", line 193, in _run_module_as_main\n      \"__main__\", mod_spec)\n    File \"/opt/conda/lib/python3.6/runpy.py\", line 85, in _run_code\n      exec(code, run_globals)\n    File \"/opt/conda/lib/python3.6/site-packages/pip/__main__.py\", line 27, in <module>\n      \"ignore\", category=DeprecationWarning, module=\".*packaging\\\\.version\"\n    File \"/opt/conda/lib/python3.6/warnings.py\", line 131, in filterwarnings\n      import re\n    File \"/opt/conda/lib/python3.6/re.py\", line 142, in <module>\n      class RegexFlag(enum.IntFlag):\n  AttributeError: module 'enum' has no attribute 'IntFlag'\n  ----------------------------------------\u001b[0m\n\u001b[33mWARNING: Discarding https://pypi.tuna.tsinghua.edu.cn/packages/98/a9/01f50aee85949ba713733b69a3f0b42d39719a414a0e29bdf2a9f05ecc53/lightgbm-4.1.0.tar.gz#sha256=bee59dd269a93b093f2c610d4a6683a7ea87c63d3ea35c622123ce2c020b2abc (from https://pypi.tuna.tsinghua.edu.cn/simple/lightgbm/) (requires-python:>=3.6). Command errored out with exit status 1: /opt/conda/bin/python /opt/conda/lib/python3.6/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-4hb5r0ot/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.tuna.tsinghua.edu.cn/simple -- 'scikit-build-core>=0.4.4' Check the logs for full command output.\u001b[0m\n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d8/61/4165b1caf07d661c4f0241534bbc18748e49e1ddb849fd9908c36c1d622c/lightgbm-4.0.0.tar.gz (1.7 MB)\n     |████████████████████████████████| 1.7 MB 2.4 MB/s             \n\u001b[?25h  Installing build dependencies ... \u001b[?25lerror\n\u001b[31m  ERROR: Command errored out with exit status 1:\n   command: /opt/conda/bin/python /opt/conda/lib/python3.6/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-i2o_0a3j/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.tuna.tsinghua.edu.cn/simple -- 'scikit-build-core>=0.4.4'\n       cwd: None\n  Complete output (12 lines):\n  Traceback (most recent call last):\n    File \"/opt/conda/lib/python3.6/runpy.py\", line 193, in _run_module_as_main\n      \"__main__\", mod_spec)\n    File \"/opt/conda/lib/python3.6/runpy.py\", line 85, in _run_code\n      exec(code, run_globals)\n    File \"/opt/conda/lib/python3.6/site-packages/pip/__main__.py\", line 27, in <module>\n      \"ignore\", category=DeprecationWarning, module=\".*packaging\\\\.version\"\n    File \"/opt/conda/lib/python3.6/warnings.py\", line 131, in filterwarnings\n      import re\n    File \"/opt/conda/lib/python3.6/re.py\", line 142, in <module>\n      class RegexFlag(enum.IntFlag):\n  AttributeError: module 'enum' has no attribute 'IntFlag'\n  ----------------------------------------\u001b[0m\n\u001b[33mWARNING: Discarding https://pypi.tuna.tsinghua.edu.cn/packages/d8/61/4165b1caf07d661c4f0241534bbc18748e49e1ddb849fd9908c36c1d622c/lightgbm-4.0.0.tar.gz#sha256=03d1b3903aa51cd9a5e3965941236f2a7bf5a69d7a76059dbf68fd9b4fc92d8f (from https://pypi.tuna.tsinghua.edu.cn/simple/lightgbm/) (requires-python:>=3.6). Command errored out with exit status 1: /opt/conda/bin/python /opt/conda/lib/python3.6/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-i2o_0a3j/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.tuna.tsinghua.edu.cn/simple -- 'scikit-build-core>=0.4.4' Check the logs for full command output.\u001b[0m\n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/38/5c/d9773cf0ea7938f3b777eaacc6f9d58f69ca76a667771364ffefed9095b4/lightgbm-3.3.5-py3-none-manylinux1_x86_64.whl (2.0 MB)\n     |████████████████████████████████| 2.0 MB 1.4 MB/s            \n\u001b[?25hCollecting Boruta\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/30/de/37bb80bba7fb7baa703b78fab37487b21b43fe4bb5d5a1ab09ecab9b76c6/Boruta-0.4.3-py3-none-any.whl (57 kB)\n     |████████████████████████████████| 57 kB 9.7 MB/s              \n\u001b[?25hCollecting mlxtend>=0.17.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/1c/07/512f6a780239ad6ce06ce2aa7b4067583f5ddcfc7703a964a082c706a070/mlxtend-0.23.1-py3-none-any.whl (1.4 MB)\n     |████████████████████████████████| 1.4 MB 122.3 MB/s            \n\u001b[?25hCollecting imbalanced-learn==0.7.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c8/81/8db4d87b03b998fda7c6f835d807c9ae4e3b141f978597b8d7f31600be15/imbalanced_learn-0.7.0-py3-none-any.whl (167 kB)\n     |████████████████████████████████| 167 kB 1.9 MB/s            \n\u001b[?25hCollecting plotly>=4.4.1\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a8/07/72953cf70e3bd3a24cbc3e743e6f8539abe6e3e6d83c3c0c83426eaffd39/plotly-5.18.0-py3-none-any.whl (15.6 MB)\n     |████████████████████████████████| 15.6 MB 3.0 MB/s            \n\u001b[?25hRequirement already satisfied: matplotlib in /opt/conda/lib/python3.6/site-packages (from pycaret) (3.1.2)\nRequirement already satisfied: seaborn in /opt/conda/lib/python3.6/site-packages (from pycaret) (0.9.0)\nRequirement already satisfied: textblob in /opt/conda/lib/python3.6/site-packages (from pycaret) (0.15.1)\nRequirement already satisfied: IPython in /opt/conda/lib/python3.6/site-packages (from pycaret) (7.2.0)\nCollecting kmodes>=0.10.1\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/1a/a8/0d3bf6f3340cbcb8cf4ad02c306d157af8f09ce86aadf5346e00605870dd/kmodes-0.12.2-py2.py3-none-any.whl (20 kB)\nRequirement already satisfied: pandas in /opt/conda/lib/python3.6/site-packages (from pycaret) (0.24.2)\nRequirement already satisfied: wordcloud in /opt/conda/lib/python3.6/site-packages (from pycaret) (1.5.0)\nRequirement already satisfied: ipywidgets in /opt/conda/lib/python3.6/site-packages (from pycaret) (7.5.0)\nCollecting yellowbrick>=1.0.1\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/06/35/c7d44bb541c06bc41b3239b27af79ea0ecc7dbb156ee1335576f99c58b91/yellowbrick-1.5-py3-none-any.whl (282 kB)\n     |████████████████████████████████| 282 kB 113.4 MB/s            \n\u001b[?25hRequirement already satisfied: scipy<=1.5.4 in /opt/conda/lib/python3.6/site-packages (from pycaret) (1.2.0)\nRequirement already satisfied: nltk in /opt/conda/lib/python3.6/site-packages (from pycaret) (3.4.1)\nCollecting cufflinks>=0.17.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/1a/18/4d32edaaf31ba4af9745dac676c4a28c48d3fc539000c29e855bd8db3b86/cufflinks-0.17.3.tar.gz (81 kB)\n     |████████████████████████████████| 81 kB 1.5 MB/s             \n\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25ldone\n\u001b[?25hRequirement already satisfied: pyLDAvis in /opt/conda/lib/python3.6/site-packages (from pycaret) (2.1.1)\nCollecting umap-learn\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d1/1b/46802a050b1c55d10c4f59fc6afd2b45ac9b4f62b2e12092d3f599286f14/umap_learn-0.5.6-py3-none-any.whl (85 kB)\n     |████████████████████████████████| 85 kB 5.3 MB/s             \n\u001b[?25hCollecting scikit-learn==0.23.2\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/5c/a1/273def87037a7fb010512bbc5901c31cfddfca8080bc63b42b26e3cc55b3/scikit_learn-0.23.2-cp36-cp36m-manylinux1_x86_64.whl (6.8 MB)\n     |████████████████████████████████| 6.8 MB 1.4 MB/s            \n\u001b[?25hRequirement already satisfied: gensim<4.0.0 in /opt/conda/lib/python3.6/site-packages (from pycaret) (3.7.3)\nRequirement already satisfied: numba<0.55 in /opt/conda/lib/python3.6/site-packages (from pycaret) (0.44.1)\nCollecting pyod\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/0d/af/4529584c562fcbce45f80e2f847a5cbefd8d6eb20d1af57c673f77bb3060/pyod-2.0.1.tar.gz (163 kB)\n     |████████████████████████████████| 163 kB 3.3 MB/s            \n\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25ldone\n\u001b[?25hRequirement already satisfied: pyyaml<6.0.0 in /opt/conda/lib/python3.6/site-packages (from pycaret) (5.3.1)\nRequirement already satisfied: spacy<2.4.0 in /opt/conda/lib/python3.6/site-packages (from pycaret) (2.1.4)\nCollecting pandas-profiling>=2.8.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/57/b7/e13216265ae3135ecda13e966aad9ce04b7e7b3e2d87d056b032fc9f457c/pandas_profiling-3.2.0-py2.py3-none-any.whl (262 kB)\n     |████████████████████████████████| 262 kB 2.4 MB/s            \n\u001b[?25hCollecting scikit-plot\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7c/47/32520e259340c140a4ad27c1b97050dd3254fdc517b1d59974d47037510e/scikit_plot-0.3.7-py3-none-any.whl (33 kB)\nRequirement already satisfied: numpy>=1.13.3 in /opt/conda/lib/python3.6/site-packages (from imbalanced-learn==0.7.0->pycaret) (1.16.3)\nCollecting threadpoolctl>=2.0.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/61/cf/6e354304bcb9c6413c4e02a747b600061c21d38ba51e7e544ac7bc66aecc/threadpoolctl-3.1.0-py3-none-any.whl (14 kB)\nRequirement already satisfied: six>=1.9.0 in /opt/conda/lib/python3.6/site-packages (from cufflinks>=0.17.0->pycaret) (1.15.0)\nRequirement already satisfied: colorlover>=0.2.1 in /opt/conda/lib/python3.6/site-packages (from cufflinks>=0.17.0->pycaret) (0.3.0)\nRequirement already satisfied: setuptools>=34.4.1 in /opt/conda/lib/python3.6/site-packages (from cufflinks>=0.17.0->pycaret) (49.2.0)\nRequirement already satisfied: smart-open>=1.7.0 in /opt/conda/lib/python3.6/site-packages (from gensim<4.0.0->pycaret) (1.8.4)\nRequirement already satisfied: jedi>=0.10 in /opt/conda/lib/python3.6/site-packages (from IPython->pycaret) (0.13.2)\nRequirement already satisfied: decorator in /opt/conda/lib/python3.6/site-packages (from IPython->pycaret) (4.4.2)\nRequirement already satisfied: pickleshare in /opt/conda/lib/python3.6/site-packages (from IPython->pycaret) (0.7.5)\nRequirement already satisfied: traitlets>=4.2 in /opt/conda/lib/python3.6/site-packages (from IPython->pycaret) (4.3.3)\nRequirement already satisfied: prompt_toolkit<2.1.0,>=2.0.0 in /opt/conda/lib/python3.6/site-packages (from IPython->pycaret) (2.0.7)\nRequirement already satisfied: pygments in /opt/conda/lib/python3.6/site-packages (from IPython->pycaret) (2.3.1)\nRequirement already satisfied: backcall in /opt/conda/lib/python3.6/site-packages (from IPython->pycaret) (0.1.0)\nRequirement already satisfied: pexpect in /opt/conda/lib/python3.6/site-packages (from IPython->pycaret) (4.6.0)\nRequirement already satisfied: ipykernel>=4.5.1 in /opt/conda/lib/python3.6/site-packages (from ipywidgets->pycaret) (5.1.0)\nRequirement already satisfied: nbformat>=4.2.0 in /opt/conda/lib/python3.6/site-packages (from ipywidgets->pycaret) (5.0.7)\nRequirement already satisfied: widgetsnbextension~=3.5.0 in /opt/conda/lib/python3.6/site-packages (from ipywidgets->pycaret) (3.5.0)\nRequirement already satisfied: wheel in /opt/conda/lib/python3.6/site-packages (from lightgbm>=2.3.1->pycaret) (0.30.0)\nCollecting scipy<=1.5.4\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c8/89/63171228d5ced148f5ced50305c89e8576ffc695a90b58fe5bb602b910c2/scipy-1.5.4-cp36-cp36m-manylinux1_x86_64.whl (25.9 MB)\n     |████████████████████████████████| 25.9 MB 3.0 MB/s              \n\u001b[?25hCollecting mlxtend>=0.17.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/73/da/d5d77a9a7a135c948dbf8d3b873655b105a152d69e590150c83d23c3d070/mlxtend-0.23.0-py3-none-any.whl (1.4 MB)\n     |████████████████████████████████| 1.4 MB 11.1 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/40/cc/fc7a27d740d4eb570c9c6db6f7a89cf72b1d50d00b923b4ef0b83e4a947d/mlxtend-0.22.0-py2.py3-none-any.whl (1.4 MB)\n     |████████████████████████████████| 1.4 MB 10.8 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c7/89/42528c20f6c696f1b8ae7407bd2edf20291a071fd939400ecc0df87d895c/mlxtend-0.21.0-py2.py3-none-any.whl (1.3 MB)\n     |████████████████████████████████| 1.3 MB 1.5 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/45/89/492924d6fc2cc9524f90febd0e9f7487c02261a8689c7c97348b09d0d071/mlxtend-0.20.0-py2.py3-none-any.whl (1.3 MB)\n     |████████████████████████████████| 1.3 MB 128.1 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2a/4f/11a257bc17f675691080219c6fe3525e49c7077535c3d64c0c2afc79cfc9/mlxtend-0.19.0-py2.py3-none-any.whl (1.3 MB)\n     |████████████████████████████████| 1.3 MB 1.3 MB/s            \n\u001b[?25hRequirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.6/site-packages (from matplotlib->pycaret) (0.10.0)\nRequirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->pycaret) (2.8.1)\nRequirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->pycaret) (1.1.0)\nRequirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->pycaret) (2.1.10)\nRequirement already satisfied: llvmlite>=0.29.0 in /opt/conda/lib/python3.6/site-packages (from numba<0.55->pycaret) (0.29.0)\nRequirement already satisfied: pytz>=2011k in /opt/conda/lib/python3.6/site-packages (from pandas->pycaret) (2019.1)\nCollecting htmlmin>=0.1.12\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b3/e7/fcd59e12169de19f0131ff2812077f964c6b960e7c09804d30a7bf2ab461/htmlmin-0.1.12.tar.gz (19 kB)\n  Preparing metadata (setup.py) ... \u001b[?25ldone\n\u001b[?25hCollecting jinja2>=2.11.1\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/20/9a/e5d9ec41927401e41aea8af6d16e78b5e612bca4699d417f646a9610a076/Jinja2-3.0.3-py3-none-any.whl (133 kB)\n     |████████████████████████████████| 133 kB 3.4 MB/s            \n\u001b[?25hCollecting matplotlib\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/09/03/b7b30fa81cb687d1178e085d0f01111ceaea3bf81f9330c937fb6f6c8ca0/matplotlib-3.3.4-cp36-cp36m-manylinux1_x86_64.whl (11.5 MB)\n     |████████████████████████████████| 11.5 MB 10.2 MB/s            \n\u001b[?25hCollecting seaborn\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/10/5b/0479d7d845b5ba410ca702ffcd7f2cd95a14a4dfff1fde2637802b258b9b/seaborn-0.11.2-py3-none-any.whl (292 kB)\n     |████████████████████████████████| 292 kB 11.7 MB/s            \n\u001b[?25hCollecting pandas-profiling>=2.8.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b8/bb/7d1a8523711c7022601d17a8449b822dc5ffaf272692d3538771b0538631/pandas_profiling-3.1.0-py2.py3-none-any.whl (261 kB)\n     |████████████████████████████████| 261 kB 1.3 MB/s            \n\u001b[?25hCollecting markupsafe~=2.0.1\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/e2/a9/eafee9babd4b3aed918d286fbe1c20d1a22d347b30d2bddb3c49919548fa/MarkupSafe-2.0.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (30 kB)\nCollecting tangled-up-in-unicode==0.1.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/93/3e/cb354fb2097fcf2fd5b5a342b10ae2a6e9363ba435b64e3e00c414064bc7/tangled_up_in_unicode-0.1.0-py3-none-any.whl (3.1 MB)\n     |████████████████████████████████| 3.1 MB 11.8 MB/s            \n\u001b[?25hCollecting pandas\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c3/e2/00cacecafbab071c787019f00ad84ca3185952f6bb9bca9550ed83870d4d/pandas-1.1.5-cp36-cp36m-manylinux1_x86_64.whl (9.5 MB)\n     |████████████████████████████████| 9.5 MB 1.5 MB/s            \n\u001b[?25hCollecting missingno>=0.4.2\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/87/22/cd5cf999af21c2f97486622c551ac3d07361ced8125121e907f588ff5f24/missingno-0.5.2-py3-none-any.whl (8.7 kB)\nCollecting multimethod>=1.4\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/13/4e/899c33da18671c2cf47f0d43b232fc220d465e37e90d7a151c261779416b/multimethod-1.5-py3-none-any.whl (7.7 kB)\nCollecting phik>=0.11.1\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c7/f3/41f78b7ace5472fea5e6ad98d4b79c642a3563adfecb425ff0b8e681f2bc/phik-0.12.0-cp36-cp36m-manylinux2010_x86_64.whl (675 kB)\n     |████████████████████████████████| 675 kB 2.4 MB/s            \n\u001b[?25hRequirement already satisfied: tqdm>=4.48.2 in /opt/conda/lib/python3.6/site-packages (from pandas-profiling>=2.8.0->pycaret) (4.49.0)\nCollecting visions[type_image_path]==0.7.4\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/66/00/166b2beb8046f06b77a2bf2c1dafeb52eff608f7dd420c767d5f3ce36ef5/visions-0.7.4-py3-none-any.whl (102 kB)\n     |████████████████████████████████| 102 kB 125 kB/s            \n\u001b[?25hCollecting pydantic>=1.8.1\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c2/6e/f4d724c7004cace580bd3c7ee6be87f4607dda0249574235d26d19b4258c/pydantic-1.9.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.2 MB)\n     |████████████████████████████████| 11.2 MB 354 kB/s            \n\u001b[?25hRequirement already satisfied: requests>=2.24.0 in /opt/conda/lib/python3.6/site-packages (from pandas-profiling>=2.8.0->pycaret) (2.26.0)\nCollecting joblib\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/55/85/70c6602b078bd9e6f3da4f467047e906525c355a4dacd4f71b97a35d9897/joblib-1.0.1-py3-none-any.whl (303 kB)\n     |████████████████████████████████| 303 kB 13.7 MB/s            \n\u001b[?25hCollecting networkx>=2.4\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/f3/b7/c7f488101c0bb5e4178f3cde416004280fd40262433496830de8a8c21613/networkx-2.5.1-py3-none-any.whl (1.6 MB)\n     |████████████████████████████████| 1.6 MB 1.8 MB/s            \n\u001b[?25hRequirement already satisfied: attrs>=19.3.0 in /opt/conda/lib/python3.6/site-packages (from visions[type_image_path]==0.7.4->pandas-profiling>=2.8.0->pycaret) (19.3.0)\nCollecting imagehash\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2d/b4/19a746a986c6e38595fa5947c028b1b8e287773dcad766e648897ad2a4cf/ImageHash-4.3.1-py2.py3-none-any.whl (296 kB)\n     |████████████████████████████████| 296 kB 1.1 MB/s            \n\u001b[?25hRequirement already satisfied: Pillow in /opt/conda/lib/python3.6/site-packages (from visions[type_image_path]==0.7.4->pandas-profiling>=2.8.0->pycaret) (5.3.0)\nCollecting Pillow\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ea/0f/2fa195c2d8c6fe0b3dc2df5fc6ac6b8dbd005ea30aaa0fa43eca88b8c664/Pillow-8.4.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)\n     |████████████████████████████████| 3.1 MB 8.5 MB/s            \n\u001b[?25hRequirement already satisfied: packaging in /opt/conda/lib/python3.6/site-packages (from plotly>=4.4.1->pycaret) (19.0)\nRequirement already satisfied: tenacity>=6.2.0 in /opt/conda/lib/python3.6/site-packages/tenacity-6.2.0-py3.6.egg (from plotly>=4.4.1->pycaret) (6.2.0)\nRequirement already satisfied: wasabi<1.1.0,>=0.2.0 in /opt/conda/lib/python3.6/site-packages (from spacy<2.4.0->pycaret) (0.2.2)\nRequirement already satisfied: blis<0.3.0,>=0.2.2 in /opt/conda/lib/python3.6/site-packages (from spacy<2.4.0->pycaret) (0.2.4)\nRequirement already satisfied: srsly<1.1.0,>=0.0.5 in /opt/conda/lib/python3.6/site-packages (from spacy<2.4.0->pycaret) (0.0.7)\nRequirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /opt/conda/lib/python3.6/site-packages (from spacy<2.4.0->pycaret) (1.0.2)\nRequirement already satisfied: preshed<2.1.0,>=2.0.1 in /opt/conda/lib/python3.6/site-packages (from spacy<2.4.0->pycaret) (2.0.1)\nRequirement already satisfied: plac<1.0.0,>=0.9.6 in /opt/conda/lib/python3.6/site-packages (from spacy<2.4.0->pycaret) (0.9.6)\nRequirement already satisfied: thinc<7.1.0,>=7.0.2 in /opt/conda/lib/python3.6/site-packages (from spacy<2.4.0->pycaret) (7.0.4)\nCollecting jsonschema<3.1.0,>=2.6.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/54/48/f5f11003ceddcd4ad292d4d9b5677588e9169eef41f88e38b2888e7ec6c4/jsonschema-3.0.2-py2.py3-none-any.whl (54 kB)\n     |████████████████████████████████| 54 kB 986 kB/s             \n\u001b[?25hRequirement already satisfied: cymem<2.1.0,>=2.0.2 in /opt/conda/lib/python3.6/site-packages (from spacy<2.4.0->pycaret) (2.0.2)\nCollecting yellowbrick>=1.0.1\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a4/dc/fb17b2aa792d67353456899c36e8e2a4dfe284e9ed3124f85fc3879cea2a/yellowbrick-1.4-py3-none-any.whl (274 kB)\n     |████████████████████████████████| 274 kB 14.1 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/3a/15/58feb940b6a2f52d3335cccf9e5d00704ec5ba62782da83f7e2abeca5e4b/yellowbrick-1.3.post1-py3-none-any.whl (271 kB)\n     |████████████████████████████████| 271 kB 1.3 MB/s            \n\u001b[?25hCollecting alembic\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b3/e2/8d48220731b7279911c43e95cd182961a703b939de6822b00de3ea0d3159/alembic-1.7.7-py3-none-any.whl (210 kB)\n     |████████████████████████████████| 210 kB 9.8 MB/s            \n\u001b[?25hCollecting databricks-cli>=0.8.7\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/0b/5f/e032f8679bf4c7f88ac68d285b0caa53bf428127e5e60dd7e6c99585f582/databricks-cli-0.17.8.tar.gz (85 kB)\n     |████████████████████████████████| 85 kB 318 kB/s             \n\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25ldone\n\u001b[?25hCollecting gunicorn\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/0e/2a/c3a878eccb100ccddf45c50b6b8db8cf3301a6adede6e31d48e8531cab13/gunicorn-21.2.0-py3-none-any.whl (80 kB)\n     |████████████████████████████████| 80 kB 1.7 MB/s             \n\u001b[?25hRequirement already satisfied: cloudpickle in /opt/conda/lib/python3.6/site-packages (from mlflow->pycaret) (1.2.1)\nCollecting gitpython>=2.1.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/bc/91/b38c4fabb6e5092ab23492ded4f318ab7299b19263272b703478038c0fbc/GitPython-3.1.18-py3-none-any.whl (170 kB)\n     |████████████████████████████████| 170 kB 8.5 MB/s             \n\u001b[?25hRequirement already satisfied: click>=7.0 in /opt/conda/lib/python3.6/site-packages (from mlflow->pycaret) (7.0)\nRequirement already satisfied: docker>=4.0.0 in /opt/conda/lib/python3.6/site-packages (from mlflow->pycaret) (4.0.2)\nCollecting sqlparse>=0.3.1\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/98/5a/66d7c9305baa9f11857f247d4ba761402cea75db6058ff850ed7128957b7/sqlparse-0.4.4-py3-none-any.whl (41 kB)\n     |████████████████████████████████| 41 kB 287 kB/s             \n\u001b[?25hRequirement already satisfied: protobuf>=3.7.0 in /opt/conda/lib/python3.6/site-packages (from mlflow->pycaret) (3.8.0)\nRequirement already satisfied: sqlalchemy in /opt/conda/lib/python3.6/site-packages (from mlflow->pycaret) (1.3.5)\nCollecting importlib-metadata!=4.7.0,>=3.7.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a0/a1/b153a0a4caf7a7e3f15c2cd56c7702e2cf3d89b1b359d1f1c5e59d68f4ce/importlib_metadata-4.8.3-py3-none-any.whl (17 kB)\nCollecting prometheus-flask-exporter\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/78/17/54b7f92f25de491f746002e1e734fdc260edf2dd75c4777c7b77b49b7e31/prometheus_flask_exporter-0.23.1-py3-none-any.whl (18 kB)\nCollecting querystring-parser\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/88/6b/572b2590fd55114118bf08bde63c0a421dcc82d593700f3e2ad89908a8a9/querystring_parser-1.2.4-py2.py3-none-any.whl (7.9 kB)\nRequirement already satisfied: entrypoints in /opt/conda/lib/python3.6/site-packages (from mlflow->pycaret) (0.2.3)\nRequirement already satisfied: Flask in /opt/conda/lib/python3.6/site-packages (from mlflow->pycaret) (1.1.1)\nRequirement already satisfied: funcy in /opt/conda/lib/python3.6/site-packages (from pyLDAvis->pycaret) (1.12)\nRequirement already satisfied: pytest in /opt/conda/lib/python3.6/site-packages (from pyLDAvis->pycaret) (5.0.1)\nRequirement already satisfied: future in /opt/conda/lib/python3.6/site-packages (from pyLDAvis->pycaret) (0.17.1)\nRequirement already satisfied: numexpr in /opt/conda/lib/python3.6/site-packages (from pyLDAvis->pycaret) (2.6.9)\nCollecting numpy>=1.13.3\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/14/32/d3fa649ad7ec0b82737b92fefd3c4dd376b0bb23730715124569f38f3a08/numpy-1.19.5-cp36-cp36m-manylinux2010_x86_64.whl (14.8 MB)\n     |████████████████████████████████| 14.8 MB 4.4 MB/s            \n\u001b[?25hCollecting numba<0.55\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/4a/c1/e7fdbfc886a9d9c11767533903db0d816c0f656fd6029f4a061742893694/numba-0.53.1-cp36-cp36m-manylinux2014_x86_64.whl (3.4 MB)\n     |████████████████████████████████| 3.4 MB 1.3 MB/s            \n\u001b[?25hCollecting llvmlite>=0.29.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/4d/5a/707cc7e072d71bc19869d093e5cf9b7be98cb42d2398489465474d007ce8/llvmlite-0.36.0-cp36-cp36m-manylinux2010_x86_64.whl (25.3 MB)\n     |████████████████████████████████| 25.3 MB 810 kB/s               | 14.7 MB 2.1 MB/s eta 0:00:06\n\u001b[?25hCollecting pynndescent>=0.5\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d2/53/d23a97e0a2c690d40b165d1062e2c4ccc796be458a1ce59f6ba030434663/pynndescent-0.5.13-py3-none-any.whl (56 kB)\n     |████████████████████████████████| 56 kB 983 kB/s             \n\u001b[?25hRequirement already satisfied: pyjwt>=1.7.0 in /opt/conda/lib/python3.6/site-packages (from databricks-cli>=0.8.7->mlflow->pycaret) (2.1.0)\nCollecting oauthlib>=3.1.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7e/80/cab10959dc1faead58dc8384a781dfbf93cb4d33d50988f7a69f1b7c9bbe/oauthlib-3.2.2-py3-none-any.whl (151 kB)\n     |████████████████████████████████| 151 kB 2.4 MB/s            \n\u001b[?25hCollecting tabulate>=0.7.7\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/92/4e/e5a13fdb3e6f81ce11893523ff289870c87c8f1f289a7369fb0e9840c3bb/tabulate-0.8.10-py3-none-any.whl (29 kB)\nCollecting urllib3<2.0.0,>=1.26.7\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ae/6a/99eaaeae8becaa17a29aeb334a18e5d582d873b6f084c11f02581b8d7f7f/urllib3-1.26.19-py2.py3-none-any.whl (143 kB)\n     |████████████████████████████████| 143 kB 73.6 MB/s            \n\u001b[?25hRequirement already satisfied: websocket-client>=0.32.0 in /opt/conda/lib/python3.6/site-packages (from docker>=4.0.0->mlflow->pycaret) (0.56.0)\nRequirement already satisfied: typing-extensions>=3.7.4.0 in /opt/conda/lib/python3.6/site-packages (from gitpython>=2.1.0->mlflow->pycaret) (3.7.4)\nCollecting gitdb<5,>=4.0.1\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a3/7c/5d747655049bfbf75b5fcec57c8115896cb78d6fafa84f6d3ef4c0f13a98/gitdb-4.0.9-py3-none-any.whl (63 kB)\n     |████████████████████████████████| 63 kB 2.8 MB/s             \n\u001b[?25hRequirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.6/site-packages (from importlib-metadata!=4.7.0,>=3.7.0->mlflow->pycaret) (3.1.0)\nRequirement already satisfied: jupyter_client in /opt/conda/lib/python3.6/site-packages (from ipykernel>=4.5.1->ipywidgets->pycaret) (6.1.6)\nRequirement already satisfied: tornado>=4.2 in /opt/conda/lib/python3.6/site-packages (from ipykernel>=4.5.1->ipywidgets->pycaret) (4.5.3)\nRequirement already satisfied: parso>=0.3.0 in /opt/conda/lib/python3.6/site-packages (from jedi>=0.10->IPython->pycaret) (0.3.1)\nRequirement already satisfied: pyrsistent>=0.14.0 in /opt/conda/lib/python3.6/site-packages (from jsonschema<3.1.0,>=2.6.0->spacy<2.4.0->pycaret) (0.16.0)\nRequirement already satisfied: jupyter-core in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2.0->ipywidgets->pycaret) (4.6.3)\nRequirement already satisfied: ipython-genutils in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2.0->ipywidgets->pycaret) (0.2.0)\nRequirement already satisfied: wcwidth in /opt/conda/lib/python3.6/site-packages (from prompt_toolkit<2.1.0,>=2.0.0->IPython->pycaret) (0.1.7)\nCollecting typing-extensions>=3.7.4.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/45/6b/44f7f8f1e110027cf88956b59f2fad776cca7e1704396d043f89effd3a0e/typing_extensions-4.1.1-py3-none-any.whl (26 kB)\nCollecting dataclasses>=0.6\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/fe/ca/75fac5856ab5cfa51bbbcefa250182e50441074fdc3f803f6e76451fab43/dataclasses-0.8-py3-none-any.whl (19 kB)\nRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests>=2.24.0->pandas-profiling>=2.8.0->pycaret) (3.2)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests>=2.24.0->pandas-profiling>=2.8.0->pycaret) (2021.5.30)\nRequirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.6/site-packages (from requests>=2.24.0->pandas-profiling>=2.8.0->pycaret) (2.0.4)\nRequirement already satisfied: boto3 in /opt/conda/lib/python3.6/site-packages (from smart-open>=1.7.0->gensim<4.0.0->pycaret) (1.14.29)\nRequirement already satisfied: boto>=2.32 in /opt/conda/lib/python3.6/site-packages (from smart-open>=1.7.0->gensim<4.0.0->pycaret) (2.49.0)\nRequirement already satisfied: notebook>=4.4.1 in /opt/conda/lib/python3.6/site-packages (from widgetsnbextension~=3.5.0->ipywidgets->pycaret) (4.4.1)\nCollecting Mako\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b4/4d/e03d08f16ee10e688bde9016bc80af8b78c7f36a8b37c7194da48f72207e/Mako-1.1.6-py2.py3-none-any.whl (75 kB)\n     |████████████████████████████████| 75 kB 271 kB/s             \n\u001b[?25hCollecting importlib-resources\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/24/1b/33e489669a94da3ef4562938cd306e8fa915e13939d7b8277cb5569cb405/importlib_resources-5.4.0-py3-none-any.whl (28 kB)\nRequirement already satisfied: Werkzeug>=0.15 in /opt/conda/lib/python3.6/site-packages (from Flask->mlflow->pycaret) (0.15.4)\nRequirement already satisfied: itsdangerous>=0.24 in /opt/conda/lib/python3.6/site-packages (from Flask->mlflow->pycaret) (1.1.0)\nRequirement already satisfied: ptyprocess>=0.5 in /opt/conda/lib/python3.6/site-packages (from pexpect->IPython->pycaret) (0.6.0)\nRequirement already satisfied: prometheus-client in /opt/conda/lib/python3.6/site-packages (from prometheus-flask-exporter->mlflow->pycaret) (0.5.0)\nRequirement already satisfied: atomicwrites>=1.0 in /opt/conda/lib/python3.6/site-packages (from pytest->pyLDAvis->pycaret) (1.3.0)\nRequirement already satisfied: py>=1.5.0 in /opt/conda/lib/python3.6/site-packages (from pytest->pyLDAvis->pycaret) (1.8.0)\nRequirement already satisfied: more-itertools>=4.0.0 in /opt/conda/lib/python3.6/site-packages (from pytest->pyLDAvis->pycaret) (7.1.0)\nRequirement already satisfied: pluggy<1.0,>=0.12 in /opt/conda/lib/python3.6/site-packages (from pytest->pyLDAvis->pycaret) (0.12.0)\nCollecting smmap<6,>=3.0.1\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/6d/01/7caa71608bc29952ae09b0be63a539e50d2484bc37747797a66a60679856/smmap-5.0.0-py3-none-any.whl (24 kB)\nRequirement already satisfied: nbconvert in /opt/conda/lib/python3.6/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret) (5.6.1)\nRequirement already satisfied: terminado>=0.3.3 in /opt/conda/lib/python3.6/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret) (0.8.1)\nRequirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /opt/conda/lib/python3.6/site-packages (from boto3->smart-open>=1.7.0->gensim<4.0.0->pycaret) (0.3.3)\nRequirement already satisfied: botocore<1.18.0,>=1.17.29 in /opt/conda/lib/python3.6/site-packages (from boto3->smart-open>=1.7.0->gensim<4.0.0->pycaret) (1.17.29)\nRequirement already satisfied: jmespath<1.0.0,>=0.7.1 in /opt/conda/lib/python3.6/site-packages (from boto3->smart-open>=1.7.0->gensim<4.0.0->pycaret) (0.10.0)\nRequirement already satisfied: PyWavelets in /opt/conda/lib/python3.6/site-packages (from imagehash->visions[type_image_path]==0.7.4->pandas-profiling>=2.8.0->pycaret) (1.0.3)\nRequirement already satisfied: pyzmq>=13 in /opt/conda/lib/python3.6/site-packages (from jupyter_client->ipykernel>=4.5.1->ipywidgets->pycaret) (19.0.1)\nRequirement already satisfied: docutils<0.16,>=0.10 in /opt/conda/lib/python3.6/site-packages (from botocore<1.18.0,>=1.17.29->boto3->smart-open>=1.7.0->gensim<4.0.0->pycaret) (0.15.2)\nCollecting botocore<1.18.0,>=1.17.29\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d1/96/5f11cca11d08703a5149bf668b2d455d5e633e12ae8e58a860f442b02112/botocore-1.17.63-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 1.6 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/1f/df/38d177a527ce7882a4c3d6bb3b22ddb6dec304faec691f5c13c530cbccf2/botocore-1.17.62-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 1.6 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/3c/5d/301310c616946f70240aa5a2e0f5c857c0e570a7c543a0d980e27ae64d18/botocore-1.17.61-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 120.6 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/e9/68/ed0b1d62bf7621a3cc4ff3ad4d55552250fc859da9efff56cf82e9dc2672/botocore-1.17.60-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 1.3 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d8/0d/d2e551c9f25b1cafd546867dc142b8be4daea5815f53e30f39942a5b7dc9/botocore-1.17.59-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 299 kB/s            ████▏                      | 1.9 MB 3.0 MB/s eta 0:00:02\n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/47/df/bf71709770d483abc9ca57a16c9015e37d4c0d8d3e337146f05dadd03d3e/botocore-1.17.58-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 126.2 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b1/31/239d7fed71ab1c4eca938e88e61896d0811e616d73533a21c8e49bf0b785/botocore-1.17.57-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 1.4 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b1/82/499909b818bddde1a4fc1228389d9d29cc2ede766a2a7370aed033dd07f9/botocore-1.17.56-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 2.2 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/21/2c/644ffb20cb8185bdb40065d1a99bd4f6b8b23ed66c842529762b498551e8/botocore-1.17.55-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 2.5 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/32/6e/0e6b758ae0e7549cadd64ee418ebbd410286738b9264e219a2f1b658ea66/botocore-1.17.54-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 2.4 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ad/17/4a148846fc1fd8236e9742980d2b5faa35dd5b8c6d7f2f109a259fd2891f/botocore-1.17.53-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 117.6 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/90/ec/cf42af08ea52cb900ef4ce174752291ed6128a1eca3460efe31f73ec81c0/botocore-1.17.52-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 2.2 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/17/87/28b2371e0e4d02a80c6748db2cde07f184ef574ad637677ba24ee049e081/botocore-1.17.51-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 2.5 MB/s             | 4.6 MB 2.5 MB/s eta 0:00:01\n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/49/d7/8a7f3cf2fe32e9d33f0cb8548c3e83d68bc453b714850ee03d2606be5370/botocore-1.17.50-py2.py3-none-any.whl (6.6 MB)\n     |████████████████████████████████| 6.6 MB 113.8 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/81/05/7375707559879ff18839f83a2e9ea1a9f446047389af19eddc4040ca0215/botocore-1.17.49-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 18.7 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/1b/6c/3222ef7fbd77d89a374844b6f718f21993a7704496f4bf52251d573e2049/botocore-1.17.48-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 130 kB/s             eta 0:00:06\n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/eb/63/3dfa39fc55118cd6c29cd768b6e37fd29fdb7aa5f7b54ac3724c26fbadf4/botocore-1.17.47-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 112.7 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d6/68/a74efb7e55e9c0ca72092f334203eaa92197af4b68a84765c5b6e4376dbf/botocore-1.17.46-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 19.7 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b1/3b/914d73c869846880305f3cf7d562b97e79ecf05df32ab57b5fb6f3ba149a/botocore-1.17.45-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 122.9 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/1e/6a/b6490235c01c941a24a86235e2a641e9505cf0ce4b4968d4987573d92bec/botocore-1.17.44-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 2.4 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7c/c1/e6675d8fd48c86612c0a8bf15e6c6b0aad43feb6308fb7f62102119d1304/botocore-1.17.43-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 18.8 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/8f/9d/ab7009f08a8f6f3d51afd54655e2c838e3eb799d77f595fb91b904532c93/botocore-1.17.42-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 112.2 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/da/ce/959a3cb0623bef9cb1c4c720d6a9a4b4c26bc719db27233701f2ebbf3626/botocore-1.17.41-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 118.0 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/3d/77/4f1f409c9c454ae798cff20744efacd5ca79059159272857636b6b560bf6/botocore-1.17.40-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 1.3 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a7/87/633efe3a2737eb9f964c0b2237a1eb97de32a9de3ee0b77f93d6059d4c9f/botocore-1.17.39-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 2.3 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2a/d2/c5f9834dc659eb78bad85b683915a31593137ab43c323ac0dc7a83e27e96/botocore-1.17.38-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 1.4 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/e8/da/55bdf7fff2ce578d38817515acf135cde6226d1604d6876fe840d44c386d/botocore-1.17.37-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 2.4 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/1b/5f/4b572dfed982e95500137d3e45a873a83fc114e910b78a989d4a3ceefa04/botocore-1.17.36-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 108.7 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/33/6a/e7008939a4b17d458d976fff0da62167b3c0f2e4015ebea09202261a092a/botocore-1.17.35-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 20.3 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/32/a7/a00aa203bec250a202eda62de98185de3095abc3a9f7ffc052cb42acf5a3/botocore-1.17.34-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 1.4 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ad/92/a134eb08fa2c96d105d22dd2f5fbf6af1e13d6fc349f76e61e07702e468a/botocore-1.17.33-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 2.4 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/28/ba/91e778ff842614ac99733e93098048a32eda1fcdc8e7101c09d6baeba0fe/botocore-1.17.32-py2.py3-none-any.whl (6.5 MB)\n     |████████████████████████████████| 6.5 MB 1.4 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ef/09/ad453cb97d14ba9434a863dbd12243e891fb13e22259fa9d30a904093fab/botocore-1.17.31-py2.py3-none-any.whl (6.4 MB)\n     |████████████████████████████████| 6.4 MB 2.4 MB/s            \n\u001b[?25h  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/62/94/f9582e581f2ca44e163da99532a8227a3c52530bb3921a27c47f522cb936/botocore-1.17.30-py2.py3-none-any.whl (6.4 MB)\n     |████████████████████████████████| 6.4 MB 1.3 MB/s            \n\u001b[?25hINFO: pip is looking at multiple versions of wcwidth to determine which version is compatible with other requirements. This could take a while.\nCollecting wcwidth\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/fd/84/fd2ba7aafacbad3c4201d395674fc6348826569da3c0937e75505ead3528/wcwidth-0.2.13-py2.py3-none-any.whl (34 kB)\nINFO: pip is looking at multiple versions of visions to determine which version is compatible with other requirements. This could take a while.\nINFO: pip is looking at multiple versions of prometheus-client to determine which version is compatible with other requirements. This could take a while.\nCollecting prometheus-client\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ad/b3/6e18c89bf6bd120590ea538a62cae16dc763ff2745b18377c4be5495c4aa/prometheus_client-0.17.1-py3-none-any.whl (60 kB)\n     |████████████████████████████████| 60 kB 6.6 MB/s             \n\u001b[?25hINFO: pip is looking at multiple versions of mako to determine which version is compatible with other requirements. This could take a while.\nCollecting Mako\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/75/69/c3ab0db9234fa5681a85a1c55203763a62902d56ad76b6d9b9bfa2c83694/Mako-1.1.5-py2.py3-none-any.whl (75 kB)\n     |████████████████████████████████| 75 kB 464 kB/s             \n\u001b[?25hINFO: pip is looking at multiple versions of jupyter-core to determine which version is compatible with other requirements. This could take a while.\nCollecting jupyter-core\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/60/7d/bee50351fe3ff6979e949b9c4c00c556a7a9732ba39b547d07d93450de23/jupyter_core-4.9.2-py3-none-any.whl (86 kB)\n     |████████████████████████████████| 86 kB 583 kB/s             \n\u001b[?25hINFO: pip is looking at multiple versions of jupyter-client to determine which version is compatible with other requirements. This could take a while.\nCollecting jupyter_client\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/56/a7/f4d3790ce7bb925d3ffe299244501a264f23ee7ec401914f7d788881ea31/jupyter_client-7.1.2-py3-none-any.whl (130 kB)\n     |████████████████████████████████| 130 kB 420 kB/s            \n\u001b[?25hINFO: pip is looking at multiple versions of ipython-genutils to determine which version is compatible with other requirements. This could take a while.\nCollecting ipython-genutils\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/fa/bc/9bd3b5c2b4774d5f33b2d544f1460be9df7df2fe42f352135381c347c69a/ipython_genutils-0.2.0-py2.py3-none-any.whl (26 kB)\nINFO: pip is looking at multiple versions of importlib-resources to determine which version is compatible with other requirements. This could take a while.\nCollecting importlib-resources\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b1/8e/f29e92e403acda0e28789c0f994500239dff45065c3b28e3a2855afc4f9a/importlib_resources-5.3.0-py3-none-any.whl (28 kB)\nINFO: pip is looking at multiple versions of imagehash to determine which version is compatible with other requirements. This could take a while.\nCollecting imagehash\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7b/61/987825bacf940190010e433c8bcd4701e141d958ad0e1e73ca5cf729ea4b/ImageHash-4.3.0-py2.py3-none-any.whl (296 kB)\n     |████████████████████████████████| 296 kB 653 kB/s            \n\u001b[?25hINFO: pip is looking at multiple versions of boto3 to determine which version is compatible with other requirements. This could take a while.\nCollecting boto3\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/75/ca/d917b244919f1ebf96f7bbd5a00e4641f7e9191b0d070258f5dc10f5eaad/boto3-1.23.10-py3-none-any.whl (132 kB)\n     |████████████████████████████████| 132 kB 979 kB/s            \n\u001b[?25hCollecting s3transfer<0.6.0,>=0.5.0\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7b/9c/f51775ebe7df5a7aa4e7c79ed671bde94e154bd968aca8d65bb24aba0c8c/s3transfer-0.5.2-py3-none-any.whl (79 kB)\n     |████████████████████████████████| 79 kB 728 kB/s             \n\u001b[?25hCollecting botocore<1.27.0,>=1.26.10\n  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/09/b8/794e0bd260198538ded90c26b353ddb632eab01950d4e7e2e2b8ee510d12/botocore-1.26.10-py3-none-any.whl (8.8 MB)\n     |████████████████████████████████| 8.8 MB 1.2 MB/s            \n\u001b[?25hRequirement already satisfied: bleach in /opt/conda/lib/python3.6/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret) (3.0.2)\nRequirement already satisfied: pandocfilters>=1.4.1 in /opt/conda/lib/python3.6/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret) (1.4.2)\nRequirement already satisfied: defusedxml in /opt/conda/lib/python3.6/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret) (0.6.0)\nRequirement already satisfied: mistune<2,>=0.8.1 in /opt/conda/lib/python3.6/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret) (0.8.4)\nRequirement already satisfied: testpath in /opt/conda/lib/python3.6/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret) (0.4.2)\nRequirement already satisfied: webencodings in /opt/conda/lib/python3.6/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret) (0.5.1)\nBuilding wheels for collected packages: cufflinks, pyod, databricks-cli, htmlmin\n  Building wheel for cufflinks (setup.py) ... \u001b[?25ldone\n\u001b[?25h  Created wheel for cufflinks: filename=cufflinks-0.17.3-py3-none-any.whl size=68797 sha256=ff9be4c540ff99531db91cfdcf53f086c26d0822be521db14e1a772db2ac6a17\n  Stored in directory: /home/mw/.cache/pip/wheels/f2/7b/f5/d76edf0f47aae04398503aa4197579f1dfab59c7962f6e06a6\n  Building wheel for pyod (setup.py) ... \u001b[?25ldone\n\u001b[?25h  Created wheel for pyod: filename=pyod-2.0.1-py3-none-any.whl size=206050 sha256=93e899f17cbafea9a5fffc18b7bafc49fcc14ab9c4361928800a208e9852adf3\n  Stored in directory: /home/mw/.cache/pip/wheels/78/40/b4/df0f0def05309b0eba68fe4b1a0343034111a98a39dad37828\n  Building wheel for databricks-cli (setup.py) ... \u001b[?25ldone\n\u001b[?25h  Created wheel for databricks-cli: filename=databricks_cli-0.17.8-py3-none-any.whl size=147822 sha256=2a498e48c59141c6be74826f9bfd1ccce1ceda75c84acce19240d01dadb2553a\n  Stored in directory: /home/mw/.cache/pip/wheels/31/66/52/076cf510803d2b40ca0e79854a1bc02ca2e7eab62a4bbbf393\n  Building wheel for htmlmin (setup.py) ... \u001b[?25ldone\n\u001b[?25h  Created wheel for htmlmin: filename=htmlmin-0.1.12-py3-none-any.whl size=27183 sha256=e9e8f17ead0fba3f710b1882cbea2311df6d16a9d96350a04359f338a433de9f\n  Stored in directory: /home/mw/.cache/pip/wheels/79/51/e9/89eed60df28e5d337133ee7ae66861f01b187a2d15cf5f1fad\nSuccessfully built cufflinks pyod databricks-cli htmlmin\nInstalling collected packages: markupsafe, jsonschema, urllib3, jinja2, typing-extensions, Pillow, numpy, botocore, threadpoolctl, tangled-up-in-unicode, smmap, scipy, s3transfer, pandas, networkx, multimethod, matplotlib, llvmlite, joblib, importlib-metadata, visions, tabulate, seaborn, scikit-learn, oauthlib, numba, Mako, importlib-resources, imagehash, gitdb, dataclasses, boto3, sqlparse, querystring-parser, pynndescent, pydantic, prometheus-flask-exporter, plotly, phik, missingno, htmlmin, gunicorn, gitpython, databricks-cli, alembic, yellowbrick, umap-learn, scikit-plot, pyod, pandas-profiling, mlxtend, mlflow, lightgbm, kmodes, imbalanced-learn, cufflinks, Boruta, pycaret\n  Attempting uninstall: markupsafe\n    Found existing installation: MarkupSafe 1.1.0\n    Uninstalling MarkupSafe-1.1.0:\n      Successfully uninstalled MarkupSafe-1.1.0\n  Attempting uninstall: jsonschema\n    Found existing installation: jsonschema 3.2.0\n    Uninstalling jsonschema-3.2.0:\n      Successfully uninstalled jsonschema-3.2.0\n  Attempting uninstall: urllib3\n    Found existing installation: urllib3 1.26.6\n    Uninstalling urllib3-1.26.6:\n      Successfully uninstalled urllib3-1.26.6\n  Attempting uninstall: jinja2\n    Found existing installation: Jinja2 2.10\n    Uninstalling Jinja2-2.10:\n      Successfully uninstalled Jinja2-2.10\n  Attempting uninstall: typing-extensions\n    Found existing installation: typing-extensions 3.7.4\n    Uninstalling typing-extensions-3.7.4:\n      Successfully uninstalled typing-extensions-3.7.4\n  Attempting uninstall: Pillow\n    Found existing installation: Pillow 5.3.0\n    Uninstalling Pillow-5.3.0:\n      Successfully uninstalled Pillow-5.3.0\n  Attempting uninstall: numpy\n    Found existing installation: numpy 1.16.3\n    Uninstalling numpy-1.16.3:\n      Successfully uninstalled numpy-1.16.3\n  Attempting uninstall: botocore\n    Found existing installation: botocore 1.17.29\n    Uninstalling botocore-1.17.29:\n      Successfully uninstalled botocore-1.17.29\n  Attempting uninstall: scipy\n    Found existing installation: scipy 1.2.0\n    Uninstalling scipy-1.2.0:\n      Successfully uninstalled scipy-1.2.0\n  Attempting uninstall: s3transfer\n    Found existing installation: s3transfer 0.3.3\n    Uninstalling s3transfer-0.3.3:\n      Successfully uninstalled s3transfer-0.3.3\n  Attempting uninstall: pandas\n    Found existing installation: pandas 0.24.2\n    Uninstalling pandas-0.24.2:\n      Successfully uninstalled pandas-0.24.2\n  Attempting uninstall: networkx\n    Found existing installation: networkx 2.3\n    Uninstalling networkx-2.3:\n      Successfully uninstalled networkx-2.3\n  Attempting uninstall: matplotlib\n    Found existing installation: matplotlib 3.1.2\n    Uninstalling matplotlib-3.1.2:\n      Successfully uninstalled matplotlib-3.1.2\n  Attempting uninstall: llvmlite\n    Found existing installation: llvmlite 0.29.0\n    Uninstalling llvmlite-0.29.0:\n      Successfully uninstalled llvmlite-0.29.0\n  Attempting uninstall: joblib\n    Found existing installation: joblib 0.13.2\n    Uninstalling joblib-0.13.2:\n      Successfully uninstalled joblib-0.13.2\n  Attempting uninstall: importlib-metadata\n    Found existing installation: importlib-metadata 1.7.0\n    Uninstalling importlib-metadata-1.7.0:\n      Successfully uninstalled importlib-metadata-1.7.0\n  Attempting uninstall: seaborn\n    Found existing installation: seaborn 0.9.0\n    Uninstalling seaborn-0.9.0:\n      Successfully uninstalled seaborn-0.9.0\n  Attempting uninstall: scikit-learn\n    Found existing installation: scikit-learn 0.21.1\n    Uninstalling scikit-learn-0.21.1:\n      Successfully uninstalled scikit-learn-0.21.1\n  Attempting uninstall: numba\n    Found existing installation: numba 0.44.1\n    Uninstalling numba-0.44.1:\n      Successfully uninstalled numba-0.44.1\n  Attempting uninstall: boto3\n    Found existing installation: boto3 1.14.29\n    Uninstalling boto3-1.14.29:\n      Successfully uninstalled boto3-1.14.29\n  Attempting uninstall: sqlparse\n    Found existing installation: sqlparse 0.3.0\n    Uninstalling sqlparse-0.3.0:\n      Successfully uninstalled sqlparse-0.3.0\n  Attempting uninstall: plotly\n    Found existing installation: plotly 3.9.0\n    Uninstalling plotly-3.9.0:\n      Successfully uninstalled plotly-3.9.0\n  Attempting uninstall: missingno\n    Found existing installation: missingno 0.4.0\n    Uninstalling missingno-0.4.0:\n      Successfully uninstalled missingno-0.4.0\n  Attempting uninstall: pandas-profiling\n    Found existing installation: pandas-profiling 1.4.2\n    Uninstalling pandas-profiling-1.4.2:\n      Successfully uninstalled pandas-profiling-1.4.2\n  Attempting uninstall: mlxtend\n    Found existing installation: mlxtend 0.16.0\n    Uninstalling mlxtend-0.16.0:\n      Successfully uninstalled mlxtend-0.16.0\n  Attempting uninstall: lightgbm\n    Found existing installation: lightgbm 2.2.3\n    Uninstalling lightgbm-2.2.3:\n      Successfully uninstalled lightgbm-2.2.3\n  Attempting uninstall: cufflinks\n    Found existing installation: cufflinks 0.12.1\n    Uninstalling cufflinks-0.12.1:\n      Successfully uninstalled cufflinks-0.12.1\n\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\npaddlepaddle 1.5.0 requires matplotlib<=2.2.4, but you have matplotlib 3.3.4 which is incompatible.\npaddlepaddle 1.5.0 requires nltk<=3.4,>=3.2.2, but you have nltk 3.4.1 which is incompatible.\npaddlepaddle 1.5.0 requires scipy<=1.2.1,>=0.19.0, but you have scipy 1.5.4 which is incompatible.\nmxnet 1.4.1 requires numpy<1.15.0,>=1.8.2, but you have numpy 1.19.5 which is incompatible.\nmoto 1.3.9 requires idna<2.9,>=2.5, but you have idna 3.2 which is incompatible.\nmoto 1.3.9 requires PyYAML==3.13, but you have pyyaml 5.3.1 which is incompatible.\nmodin 0.5.0 requires pandas==0.24.2, but you have pandas 1.1.5 which is incompatible.\njupyter-kernel-gateway 1.2.0 requires jupyter-client<5.0,>=4.2.0, but you have jupyter-client 6.1.6 which is incompatible.\nawscli 1.18.106 requires botocore==1.17.29, but you have botocore 1.26.10 which is incompatible.\nawscli 1.18.106 requires s3transfer<0.4.0,>=0.3.0, but you have s3transfer 0.5.2 which is incompatible.\nauto-sklearn 0.5.2 requires scikit-learn<0.20,>=0.19, but you have scikit-learn 0.23.2 which is incompatible.\u001b[0m\nSuccessfully installed Boruta-0.4.3 Mako-1.1.6 Pillow-8.4.0 alembic-1.7.7 boto3-1.23.10 botocore-1.26.10 cufflinks-0.17.3 databricks-cli-0.17.8 dataclasses-0.8 gitdb-4.0.9 gitpython-3.1.18 gunicorn-21.2.0 htmlmin-0.1.12 imagehash-4.3.1 imbalanced-learn-0.7.0 importlib-metadata-4.8.3 importlib-resources-5.4.0 jinja2-3.0.3 joblib-1.0.1 jsonschema-3.0.2 kmodes-0.12.2 lightgbm-3.3.5 llvmlite-0.36.0 markupsafe-2.0.1 matplotlib-3.3.4 missingno-0.5.2 mlflow-1.23.1 mlxtend-0.19.0 multimethod-1.5 networkx-2.5.1 numba-0.53.1 numpy-1.19.5 oauthlib-3.2.2 pandas-1.1.5 pandas-profiling-3.1.0 phik-0.12.0 plotly-5.18.0 prometheus-flask-exporter-0.23.1 pycaret-2.3.10 pydantic-1.9.2 pynndescent-0.5.13 pyod-2.0.1 querystring-parser-1.2.4 s3transfer-0.5.2 scikit-learn-0.23.2 scikit-plot-0.3.7 scipy-1.5.4 seaborn-0.11.2 smmap-5.0.0 sqlparse-0.4.4 tabulate-0.8.10 tangled-up-in-unicode-0.1.0 threadpoolctl-3.1.0 typing-extensions-4.1.1 umap-learn-0.5.6 urllib3-1.26.19 visions-0.7.4 yellowbrick-1.3.post1\n"}],"execution_count":2},{"cell_type":"code","metadata":{"id":"810CF482643D4985AFA558B9F60AF78D","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true,"collapsed":false,"scrolled":false},"source":"import math\nimport pandas as pd\nimport numpy as np\nfrom datetime import datetime\nfrom scipy.stats import pearsonr\nfrom sklearn.preprocessing import LabelEncoder, MinMaxScaler\nfrom sklearn.metrics import mean_squared_error, r2_score, explained_variance_score, mean_absolute_error\nfrom scipy.stats import pearsonr\nfrom pycaret.regression import *\n\nimport warnings\nwarnings.filterwarnings('ignore')","outputs":[{"output_type":"stream","name":"stderr","text":"/opt/conda/lib/python3.6/site-packages/dask/dataframe/utils.py:15: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.\n  import pandas.util.testing as tm\n/opt/conda/lib/python3.6/site-packages/pycaret/loggers/mlflow_logger.py:14: FutureWarning: MLflow support for Python 3.6 is deprecated and will be dropped in an upcoming release. At that point, existing Python 3.6 workflows that use MLflow will continue to work without modification, but Python 3.6 users will no longer get access to the latest MLflow features and bugfixes. We recommend that you upgrade to Python 3.7 or newer.\n  import mlflow\n"}],"execution_count":3},{"cell_type":"markdown","metadata":{"id":"F957615004774C2FB0A8B7C22D32E5C1","notebookId":"66cecedfc175d1eb8493abb3","runtime":{"status":"default","execution_status":null,"is_visible":false},"jupyter":{},"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"}},"source":"## 读取训练数据  \n### 处理气象数据"},{"cell_type":"code","metadata":{"trusted":true,"collapsed":false,"jupyter":{},"tags":[],"slideshow":{"slide_type":"slide"},"id":"4FC61B519BE24DD590C9913A15FBDA95","scrolled":false,"notebookId":"66cecedfc175d1eb8493abb3"},"source":"def read_weather_data(file,station):\n    # 读取气象要素数据\n    df_weather = pd.read_csv('/home/mw/input/ozone/train_weather.csv')\n    # 日期格式转换\n    df_weather['time'] = pd.to_datetime(df_weather['time'])\n    # 处理缺失值\n    df_weather[df_weather==999999.0]=np.nan\n    df_weather[df_weather==999017.0]=np.nan\n    df_weather[df_weather==999990.0]=np.nan\n    # 读取站点数据\n    df_weather = df_weather[df_weather['station']==station]\n    return df_weather","outputs":[],"execution_count":4},{"cell_type":"code","metadata":{"trusted":true,"collapsed":false,"jupyter":{},"tags":[],"slideshow":{"slide_type":"slide"},"id":"16BEDEF83DF24B65ABB0E52615CECDCB","scrolled":false,"notebookId":"66cecedfc175d1eb8493abb3"},"source":"# 读取站点A气象数据\ndf_A_weather = read_weather_data('/home/mw/input/ozone/train_weather.csv','A')\ndf_A_weather","outputs":[{"output_type":"execute_result","data":{"text/plain":"     station                time  pressure     wd   ws   tem    rh  rain\n0          A 2016-06-01 00:00:00    1002.2  120.0  2.4  22.3  97.0   0.6\n1          A 2016-06-01 01:00:00    1002.3   37.0  0.7  22.4  98.0  10.5\n2          A 2016-06-01 02:00:00    1002.4   56.0  1.7  22.1  97.0   0.4\n3          A 2016-06-01 03:00:00    1002.0   78.0  2.4  21.1  97.0   0.0\n4          A 2016-06-01 04:00:00    1002.0   66.0  1.8  20.5  97.0   0.0\n...      ...                 ...       ...    ...  ...   ...   ...   ...\n3139       A 2017-07-09 19:00:00    1003.0  220.0  1.9  31.7  60.0   0.0\n3140       A 2017-07-09 20:00:00    1003.3  178.0  2.0  31.2  64.0   0.0\n3141       A 2017-07-09 21:00:00    1003.6  174.0  1.4  30.4  73.0   0.0\n3142       A 2017-07-09 22:00:00    1003.3  183.0  1.1  30.1  72.0   0.0\n3143       A 2017-07-09 23:00:00    1002.7  191.0  2.0  29.8  78.0   0.0\n\n[3144 rows x 8 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>station</th>\n      <th>time</th>\n      <th>pressure</th>\n      <th>wd</th>\n      <th>ws</th>\n      <th>tem</th>\n      <th>rh</th>\n      <th>rain</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>A</td>\n      <td>2016-06-01 00:00:00</td>\n      <td>1002.2</td>\n      <td>120.0</td>\n      <td>2.4</td>\n      <td>22.3</td>\n      <td>97.0</td>\n      <td>0.6</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>A</td>\n      <td>2016-06-01 01:00:00</td>\n      <td>1002.3</td>\n      <td>37.0</td>\n      <td>0.7</td>\n      <td>22.4</td>\n      <td>98.0</td>\n      <td>10.5</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>A</td>\n      <td>2016-06-01 02:00:00</td>\n      <td>1002.4</td>\n      <td>56.0</td>\n      <td>1.7</td>\n      <td>22.1</td>\n      <td>97.0</td>\n      <td>0.4</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>A</td>\n      <td>2016-06-01 03:00:00</td>\n      <td>1002.0</td>\n      <td>78.0</td>\n      <td>2.4</td>\n      <td>21.1</td>\n      <td>97.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>A</td>\n      <td>2016-06-01 04:00:00</td>\n      <td>1002.0</td>\n      <td>66.0</td>\n      <td>1.8</td>\n      <td>20.5</td>\n      <td>97.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>3139</th>\n      <td>A</td>\n      <td>2017-07-09 19:00:00</td>\n      <td>1003.0</td>\n      <td>220.0</td>\n      <td>1.9</td>\n      <td>31.7</td>\n      <td>60.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>3140</th>\n      <td>A</td>\n      <td>2017-07-09 20:00:00</td>\n      <td>1003.3</td>\n      <td>178.0</td>\n      <td>2.0</td>\n      <td>31.2</td>\n      <td>64.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>3141</th>\n      <td>A</td>\n      <td>2017-07-09 21:00:00</td>\n      <td>1003.6</td>\n      <td>174.0</td>\n      <td>1.4</td>\n      <td>30.4</td>\n      <td>73.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>3142</th>\n      <td>A</td>\n      <td>2017-07-09 22:00:00</td>\n      <td>1003.3</td>\n      <td>183.0</td>\n      <td>1.1</td>\n      <td>30.1</td>\n      <td>72.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>3143</th>\n      <td>A</td>\n      <td>2017-07-09 23:00:00</td>\n      <td>1002.7</td>\n      <td>191.0</td>\n      <td>2.0</td>\n      <td>29.8</td>\n      <td>78.0</td>\n      <td>0.0</td>\n    </tr>\n  </tbody>\n</table>\n<p>3144 rows × 8 columns</p>\n</div>"},"metadata":{}}],"execution_count":5},{"cell_type":"code","metadata":{"id":"552FE38E97E045829832768837C798DC","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 读取站点B气象数据\ndf_B_weather = read_weather_data('/home/mw/input/ozone/train_weather.csv','B')\ndf_B_weather","outputs":[{"output_type":"execute_result","data":{"text/plain":"     station                time  pressure     wd   ws   tem    rh  rain\n3144       B 2016-06-01 00:00:00    1002.3   21.0  0.4  23.1  94.0   3.0\n3145       B 2016-06-01 01:00:00    1002.2    NaN  0.2  22.9  97.0  38.2\n3146       B 2016-06-01 02:00:00    1002.2   44.0  0.5  22.8  96.0   0.0\n3147       B 2016-06-01 03:00:00    1001.5   50.0  0.8  21.9  95.0   0.0\n3148       B 2016-06-01 04:00:00    1002.0   64.0  0.8  21.2  92.0   0.0\n...      ...                 ...       ...    ...  ...   ...   ...   ...\n6283       B 2017-07-09 19:00:00    1003.1  329.0  0.5  31.7  58.0   0.0\n6284       B 2017-07-09 20:00:00    1003.3  279.0  0.4  31.7  59.0   0.0\n6285       B 2017-07-09 21:00:00    1003.7  282.0  0.7  31.3  62.0   0.0\n6286       B 2017-07-09 22:00:00    1003.3  295.0  0.3  30.7  64.0   0.0\n6287       B 2017-07-09 23:00:00    1002.7  331.0  0.5  30.4  72.0   0.0\n\n[3144 rows x 8 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>station</th>\n      <th>time</th>\n      <th>pressure</th>\n      <th>wd</th>\n      <th>ws</th>\n      <th>tem</th>\n      <th>rh</th>\n      <th>rain</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>3144</th>\n      <td>B</td>\n      <td>2016-06-01 00:00:00</td>\n      <td>1002.3</td>\n      <td>21.0</td>\n      <td>0.4</td>\n      <td>23.1</td>\n      <td>94.0</td>\n      <td>3.0</td>\n    </tr>\n    <tr>\n      <th>3145</th>\n      <td>B</td>\n      <td>2016-06-01 01:00:00</td>\n      <td>1002.2</td>\n      <td>NaN</td>\n      <td>0.2</td>\n      <td>22.9</td>\n      <td>97.0</td>\n      <td>38.2</td>\n    </tr>\n    <tr>\n      <th>3146</th>\n      <td>B</td>\n      <td>2016-06-01 02:00:00</td>\n      <td>1002.2</td>\n      <td>44.0</td>\n      <td>0.5</td>\n      <td>22.8</td>\n      <td>96.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>3147</th>\n      <td>B</td>\n      <td>2016-06-01 03:00:00</td>\n      <td>1001.5</td>\n      <td>50.0</td>\n      <td>0.8</td>\n      <td>21.9</td>\n      <td>95.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>3148</th>\n      <td>B</td>\n      <td>2016-06-01 04:00:00</td>\n      <td>1002.0</td>\n      <td>64.0</td>\n      <td>0.8</td>\n      <td>21.2</td>\n      <td>92.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>6283</th>\n      <td>B</td>\n      <td>2017-07-09 19:00:00</td>\n      <td>1003.1</td>\n      <td>329.0</td>\n      <td>0.5</td>\n      <td>31.7</td>\n      <td>58.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>6284</th>\n      <td>B</td>\n      <td>2017-07-09 20:00:00</td>\n      <td>1003.3</td>\n      <td>279.0</td>\n      <td>0.4</td>\n      <td>31.7</td>\n      <td>59.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>6285</th>\n      <td>B</td>\n      <td>2017-07-09 21:00:00</td>\n      <td>1003.7</td>\n      <td>282.0</td>\n      <td>0.7</td>\n      <td>31.3</td>\n      <td>62.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>6286</th>\n      <td>B</td>\n      <td>2017-07-09 22:00:00</td>\n      <td>1003.3</td>\n      <td>295.0</td>\n      <td>0.3</td>\n      <td>30.7</td>\n      <td>64.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>6287</th>\n      <td>B</td>\n      <td>2017-07-09 23:00:00</td>\n      <td>1002.7</td>\n      <td>331.0</td>\n      <td>0.5</td>\n      <td>30.4</td>\n      <td>72.0</td>\n      <td>0.0</td>\n    </tr>\n  </tbody>\n</table>\n<p>3144 rows × 8 columns</p>\n</div>"},"metadata":{}}],"execution_count":6},{"cell_type":"code","metadata":{"id":"F2776A98120C48FD934BDCD2ADAD107C","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 合并站点A和站点B的气象数据\ndf_weather = pd.concat([df_A_weather,df_B_weather])\ndf_weather","outputs":[{"output_type":"execute_result","data":{"text/plain":"     station                time  pressure     wd   ws   tem    rh  rain\n0          A 2016-06-01 00:00:00    1002.2  120.0  2.4  22.3  97.0   0.6\n1          A 2016-06-01 01:00:00    1002.3   37.0  0.7  22.4  98.0  10.5\n2          A 2016-06-01 02:00:00    1002.4   56.0  1.7  22.1  97.0   0.4\n3          A 2016-06-01 03:00:00    1002.0   78.0  2.4  21.1  97.0   0.0\n4          A 2016-06-01 04:00:00    1002.0   66.0  1.8  20.5  97.0   0.0\n...      ...                 ...       ...    ...  ...   ...   ...   ...\n6283       B 2017-07-09 19:00:00    1003.1  329.0  0.5  31.7  58.0   0.0\n6284       B 2017-07-09 20:00:00    1003.3  279.0  0.4  31.7  59.0   0.0\n6285       B 2017-07-09 21:00:00    1003.7  282.0  0.7  31.3  62.0   0.0\n6286       B 2017-07-09 22:00:00    1003.3  295.0  0.3  30.7  64.0   0.0\n6287       B 2017-07-09 23:00:00    1002.7  331.0  0.5  30.4  72.0   0.0\n\n[6288 rows x 8 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>station</th>\n      <th>time</th>\n      <th>pressure</th>\n      <th>wd</th>\n      <th>ws</th>\n      <th>tem</th>\n      <th>rh</th>\n      <th>rain</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>A</td>\n      <td>2016-06-01 00:00:00</td>\n      <td>1002.2</td>\n      <td>120.0</td>\n      <td>2.4</td>\n      <td>22.3</td>\n      <td>97.0</td>\n      <td>0.6</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>A</td>\n      <td>2016-06-01 01:00:00</td>\n      <td>1002.3</td>\n      <td>37.0</td>\n      <td>0.7</td>\n      <td>22.4</td>\n      <td>98.0</td>\n      <td>10.5</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>A</td>\n      <td>2016-06-01 02:00:00</td>\n      <td>1002.4</td>\n      <td>56.0</td>\n      <td>1.7</td>\n      <td>22.1</td>\n      <td>97.0</td>\n      <td>0.4</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>A</td>\n      <td>2016-06-01 03:00:00</td>\n      <td>1002.0</td>\n      <td>78.0</td>\n      <td>2.4</td>\n      <td>21.1</td>\n      <td>97.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>A</td>\n      <td>2016-06-01 04:00:00</td>\n      <td>1002.0</td>\n      <td>66.0</td>\n      <td>1.8</td>\n      <td>20.5</td>\n      <td>97.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>6283</th>\n      <td>B</td>\n      <td>2017-07-09 19:00:00</td>\n      <td>1003.1</td>\n      <td>329.0</td>\n      <td>0.5</td>\n      <td>31.7</td>\n      <td>58.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>6284</th>\n      <td>B</td>\n      <td>2017-07-09 20:00:00</td>\n      <td>1003.3</td>\n      <td>279.0</td>\n      <td>0.4</td>\n      <td>31.7</td>\n      <td>59.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>6285</th>\n      <td>B</td>\n      <td>2017-07-09 21:00:00</td>\n      <td>1003.7</td>\n      <td>282.0</td>\n      <td>0.7</td>\n      <td>31.3</td>\n      <td>62.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>6286</th>\n      <td>B</td>\n      <td>2017-07-09 22:00:00</td>\n      <td>1003.3</td>\n      <td>295.0</td>\n      <td>0.3</td>\n      <td>30.7</td>\n      <td>64.0</td>\n      <td>0.0</td>\n    </tr>\n    <tr>\n      <th>6287</th>\n      <td>B</td>\n      <td>2017-07-09 23:00:00</td>\n      <td>1002.7</td>\n      <td>331.0</td>\n      <td>0.5</td>\n      <td>30.4</td>\n      <td>72.0</td>\n      <td>0.0</td>\n    </tr>\n  </tbody>\n</table>\n<p>6288 rows × 8 columns</p>\n</div>"},"metadata":{}}],"execution_count":12},{"cell_type":"markdown","metadata":{"id":"6C1EBBBC43F8462784CEB4DF6A6CBFBD","notebookId":"66cecedfc175d1eb8493abb3","runtime":{"status":"default","execution_status":null,"is_visible":false},"jupyter":{},"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"}},"source":"### 处理空气污染数据  \n由于空气污染浓度数据与气象要素数据的表格组织方式略有差异，在此对空气污染浓度数据进行重塑，为了能与气象要素数据拼接。"},{"cell_type":"code","metadata":{"id":"0821005AB8AC4372B19EED3F60412C80","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"def read_air_data(file,station):\n    # 读取空气污染物浓度数据\n    df_air = pd.read_csv('/home/mw/input/ozone/train_air.csv')\n    # 获取所有污染物的类型\n    pollution_list = df_air.type.unique()\n\n    # PM2.5数据提取\n    df_PM25 = df_air[df_air['type']=='PM2.5'].loc[:,['date','hour','Station ' + station]]\n    df_PM25.rename(columns = {'Station ' + station:'PM2.5'},inplace=True)\n\n    # PM10数据提取\n    df_PM10 = df_air[df_air['type']=='PM10'].loc[:,['date','hour','Station ' + station]]\n    df_PM10.rename(columns = {'Station ' + station:'PM10'},inplace=True)\n\n    # SO2数据提取\n    df_SO2 = df_air[df_air['type']=='SO2'].loc[:,['date','hour','Station ' + station]]\n    df_SO2.rename(columns = {'Station ' + station:'SO2'},inplace=True)\n\n    # NO2数据提取\n    df_NO2 = df_air[df_air['type']=='NO2'].loc[:,['date','hour','Station ' + station]]\n    df_NO2.rename(columns = {'Station ' + station:'NO2'},inplace=True)\n\n    # CO数据提取\n    df_CO = df_air[df_air['type']=='CO'].loc[:,['date','hour','Station ' + station]]\n    df_CO.rename(columns = {'Station ' + station:'CO'},inplace=True)\n    df_CO.reset_index(drop=True,inplace=True)\n\n    # O3数据提取\n    df_O3 = df_air[df_air['type']=='O3'].loc[:,['date','hour','Station ' + station]]\n    df_O3.rename(columns = {'Station ' + station:'O3'},inplace=True)\n    df_O3.reset_index(drop=True,inplace=True)\n\n    # 由于PM2.5的数据行数多出一行，pd.concat()方法存在一定问题，在此根据关键字段采用pd.merge()方法进行拼接\n    df_all = pd.merge(df_PM25,df_PM10, how='left', on=['date','hour'])\n    df_all = pd.merge(df_all,df_NO2, how='left', on=['date','hour'])\n    df_all = pd.merge(df_all,df_SO2, how='left', on=['date','hour'])\n    df_all = pd.merge(df_all,df_CO, how='left', on=['date','hour'])\n    df_all = pd.merge(df_all,df_O3, how='left', on=['date','hour'])\n\n    # 生成time列，删除原先的date和hour列\n    df_all['date'] = df_all['date'].astype(str)\n    df_all['hour'] = df_all['hour'].astype(str)\n    df_all['time'] = pd.to_datetime(df_all['date'] +  df_all['hour'].str.zfill(2),format='%Y%m%d%H')\n    df_all.drop(columns=['date','hour'],inplace=True)\n\n    # 添加Station列\n    df_all['station'] = station\n\n    return df_all","outputs":[],"execution_count":7},{"cell_type":"code","metadata":{"id":"57674D71189E499495963E5FE04B9232","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 读取站点A的空气污染数据\ndf_A_air = read_air_data('/home/mw/input/ozone/train_air.csv','A')\ndf_A_air","outputs":[{"output_type":"execute_result","data":{"text/plain":"      PM2.5  PM10   NO2   SO2     CO     O3                time station\n0      54.0   NaN  17.0  16.0  0.824   90.0 2016-06-01 00:00:00       A\n1      34.0   NaN  14.0  16.0  0.774   83.0 2016-06-01 01:00:00       A\n2      22.0   NaN  19.0  16.0  0.787   78.0 2016-06-01 02:00:00       A\n3      20.0   NaN  13.0  16.0  0.750  115.0 2016-06-01 03:00:00       A\n4       4.0   NaN   9.0  15.0  0.770  125.0 2016-06-01 04:00:00       A\n...     ...   ...   ...   ...    ...    ...                 ...     ...\n3062   17.0  41.0  32.0  13.0  0.400   26.0 2017-07-09 07:00:00       A\n3063   14.0  45.0  34.0  20.0  0.400   61.0 2017-07-09 20:00:00       A\n3064   24.0  51.0  37.0  15.0  0.400   58.0 2017-07-09 21:00:00       A\n3065   24.0  52.0  38.0  15.0  0.500   51.0 2017-07-09 22:00:00       A\n3066   23.0  50.0  36.0  17.0  0.500   45.0 2017-07-09 23:00:00       A\n\n[3067 rows x 8 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>PM2.5</th>\n      <th>PM10</th>\n      <th>NO2</th>\n      <th>SO2</th>\n      <th>CO</th>\n      <th>O3</th>\n      <th>time</th>\n      <th>station</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>54.0</td>\n      <td>NaN</td>\n      <td>17.0</td>\n      <td>16.0</td>\n      <td>0.824</td>\n      <td>90.0</td>\n      <td>2016-06-01 00:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>34.0</td>\n      <td>NaN</td>\n      <td>14.0</td>\n      <td>16.0</td>\n      <td>0.774</td>\n      <td>83.0</td>\n      <td>2016-06-01 01:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>22.0</td>\n      <td>NaN</td>\n      <td>19.0</td>\n      <td>16.0</td>\n      <td>0.787</td>\n      <td>78.0</td>\n      <td>2016-06-01 02:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>20.0</td>\n      <td>NaN</td>\n      <td>13.0</td>\n      <td>16.0</td>\n      <td>0.750</td>\n      <td>115.0</td>\n      <td>2016-06-01 03:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>4.0</td>\n      <td>NaN</td>\n      <td>9.0</td>\n      <td>15.0</td>\n      <td>0.770</td>\n      <td>125.0</td>\n      <td>2016-06-01 04:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>3062</th>\n      <td>17.0</td>\n      <td>41.0</td>\n      <td>32.0</td>\n      <td>13.0</td>\n      <td>0.400</td>\n      <td>26.0</td>\n      <td>2017-07-09 07:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>3063</th>\n      <td>14.0</td>\n      <td>45.0</td>\n      <td>34.0</td>\n      <td>20.0</td>\n      <td>0.400</td>\n      <td>61.0</td>\n      <td>2017-07-09 20:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>3064</th>\n      <td>24.0</td>\n      <td>51.0</td>\n      <td>37.0</td>\n      <td>15.0</td>\n      <td>0.400</td>\n      <td>58.0</td>\n      <td>2017-07-09 21:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>3065</th>\n      <td>24.0</td>\n      <td>52.0</td>\n      <td>38.0</td>\n      <td>15.0</td>\n      <td>0.500</td>\n      <td>51.0</td>\n      <td>2017-07-09 22:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>3066</th>\n      <td>23.0</td>\n      <td>50.0</td>\n      <td>36.0</td>\n      <td>17.0</td>\n      <td>0.500</td>\n      <td>45.0</td>\n      <td>2017-07-09 23:00:00</td>\n      <td>A</td>\n    </tr>\n  </tbody>\n</table>\n<p>3067 rows × 8 columns</p>\n</div>"},"metadata":{}}],"execution_count":8},{"cell_type":"code","metadata":{"id":"E93882268DEB41178D9A7A82B31F46A6","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 读取站点B的空气污染数据\ndf_B_air = read_air_data('/home/mw/input/ozone/train_air.csv','B')\ndf_B_air","outputs":[{"output_type":"execute_result","data":{"text/plain":"      PM2.5  PM10   NO2  SO2     CO     O3                time station\n0      59.0   NaN  33.0  5.0  0.762   68.0 2016-06-01 00:00:00       B\n1      38.0   NaN  29.0  6.0  0.695   65.0 2016-06-01 01:00:00       B\n2      21.0   NaN  27.0  5.0  0.805   68.0 2016-06-01 02:00:00       B\n3      17.0   NaN  23.0  4.0  0.641   80.0 2016-06-01 03:00:00       B\n4      10.0   NaN  12.0  4.0  0.605  107.0 2016-06-01 04:00:00       B\n...     ...   ...   ...  ...    ...    ...                 ...     ...\n3062   18.0  42.0  31.0  8.0  0.700   46.0 2017-07-09 07:00:00       B\n3063   18.0  34.0  26.0  7.0  0.700   76.0 2017-07-09 20:00:00       B\n3064   17.0  48.0  24.0  6.0  0.700   74.0 2017-07-09 21:00:00       B\n3065   17.0  42.0  20.0  6.0  0.600   67.0 2017-07-09 22:00:00       B\n3066   17.0  47.0  23.0  6.0  0.700   70.0 2017-07-09 23:00:00       B\n\n[3067 rows x 8 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>PM2.5</th>\n      <th>PM10</th>\n      <th>NO2</th>\n      <th>SO2</th>\n      <th>CO</th>\n      <th>O3</th>\n      <th>time</th>\n      <th>station</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>59.0</td>\n      <td>NaN</td>\n      <td>33.0</td>\n      <td>5.0</td>\n      <td>0.762</td>\n      <td>68.0</td>\n      <td>2016-06-01 00:00:00</td>\n      <td>B</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>38.0</td>\n      <td>NaN</td>\n      <td>29.0</td>\n      <td>6.0</td>\n      <td>0.695</td>\n      <td>65.0</td>\n      <td>2016-06-01 01:00:00</td>\n      <td>B</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>21.0</td>\n      <td>NaN</td>\n      <td>27.0</td>\n      <td>5.0</td>\n      <td>0.805</td>\n      <td>68.0</td>\n      <td>2016-06-01 02:00:00</td>\n      <td>B</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>17.0</td>\n      <td>NaN</td>\n      <td>23.0</td>\n      <td>4.0</td>\n      <td>0.641</td>\n      <td>80.0</td>\n      <td>2016-06-01 03:00:00</td>\n      <td>B</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>10.0</td>\n      <td>NaN</td>\n      <td>12.0</td>\n      <td>4.0</td>\n      <td>0.605</td>\n      <td>107.0</td>\n      <td>2016-06-01 04:00:00</td>\n      <td>B</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>3062</th>\n      <td>18.0</td>\n      <td>42.0</td>\n      <td>31.0</td>\n      <td>8.0</td>\n      <td>0.700</td>\n      <td>46.0</td>\n      <td>2017-07-09 07:00:00</td>\n      <td>B</td>\n    </tr>\n    <tr>\n      <th>3063</th>\n      <td>18.0</td>\n      <td>34.0</td>\n      <td>26.0</td>\n      <td>7.0</td>\n      <td>0.700</td>\n      <td>76.0</td>\n      <td>2017-07-09 20:00:00</td>\n      <td>B</td>\n    </tr>\n    <tr>\n      <th>3064</th>\n      <td>17.0</td>\n      <td>48.0</td>\n      <td>24.0</td>\n      <td>6.0</td>\n      <td>0.700</td>\n      <td>74.0</td>\n      <td>2017-07-09 21:00:00</td>\n      <td>B</td>\n    </tr>\n    <tr>\n      <th>3065</th>\n      <td>17.0</td>\n      <td>42.0</td>\n      <td>20.0</td>\n      <td>6.0</td>\n      <td>0.600</td>\n      <td>67.0</td>\n      <td>2017-07-09 22:00:00</td>\n      <td>B</td>\n    </tr>\n    <tr>\n      <th>3066</th>\n      <td>17.0</td>\n      <td>47.0</td>\n      <td>23.0</td>\n      <td>6.0</td>\n      <td>0.700</td>\n      <td>70.0</td>\n      <td>2017-07-09 23:00:00</td>\n      <td>B</td>\n    </tr>\n  </tbody>\n</table>\n<p>3067 rows × 8 columns</p>\n</div>"},"metadata":{}}],"execution_count":9},{"cell_type":"code","metadata":{"id":"2CC8939A7E2244E99C84EE1932528688","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 合并站点A和站点B的空气污染数据\ndf_air = pd.concat([df_A_air,df_B_air])\ndf_air","outputs":[{"output_type":"execute_result","data":{"text/plain":"      PM2.5  PM10   NO2   SO2     CO     O3                time station\n0      54.0   NaN  17.0  16.0  0.824   90.0 2016-06-01 00:00:00       A\n1      34.0   NaN  14.0  16.0  0.774   83.0 2016-06-01 01:00:00       A\n2      22.0   NaN  19.0  16.0  0.787   78.0 2016-06-01 02:00:00       A\n3      20.0   NaN  13.0  16.0  0.750  115.0 2016-06-01 03:00:00       A\n4       4.0   NaN   9.0  15.0  0.770  125.0 2016-06-01 04:00:00       A\n...     ...   ...   ...   ...    ...    ...                 ...     ...\n3062   18.0  42.0  31.0   8.0  0.700   46.0 2017-07-09 07:00:00       B\n3063   18.0  34.0  26.0   7.0  0.700   76.0 2017-07-09 20:00:00       B\n3064   17.0  48.0  24.0   6.0  0.700   74.0 2017-07-09 21:00:00       B\n3065   17.0  42.0  20.0   6.0  0.600   67.0 2017-07-09 22:00:00       B\n3066   17.0  47.0  23.0   6.0  0.700   70.0 2017-07-09 23:00:00       B\n\n[6134 rows x 8 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>PM2.5</th>\n      <th>PM10</th>\n      <th>NO2</th>\n      <th>SO2</th>\n      <th>CO</th>\n      <th>O3</th>\n      <th>time</th>\n      <th>station</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>54.0</td>\n      <td>NaN</td>\n      <td>17.0</td>\n      <td>16.0</td>\n      <td>0.824</td>\n      <td>90.0</td>\n      <td>2016-06-01 00:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>34.0</td>\n      <td>NaN</td>\n      <td>14.0</td>\n      <td>16.0</td>\n      <td>0.774</td>\n      <td>83.0</td>\n      <td>2016-06-01 01:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>22.0</td>\n      <td>NaN</td>\n      <td>19.0</td>\n      <td>16.0</td>\n      <td>0.787</td>\n      <td>78.0</td>\n      <td>2016-06-01 02:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>20.0</td>\n      <td>NaN</td>\n      <td>13.0</td>\n      <td>16.0</td>\n      <td>0.750</td>\n      <td>115.0</td>\n      <td>2016-06-01 03:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>4.0</td>\n      <td>NaN</td>\n      <td>9.0</td>\n      <td>15.0</td>\n      <td>0.770</td>\n      <td>125.0</td>\n      <td>2016-06-01 04:00:00</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>3062</th>\n      <td>18.0</td>\n      <td>42.0</td>\n      <td>31.0</td>\n      <td>8.0</td>\n      <td>0.700</td>\n      <td>46.0</td>\n      <td>2017-07-09 07:00:00</td>\n      <td>B</td>\n    </tr>\n    <tr>\n      <th>3063</th>\n      <td>18.0</td>\n      <td>34.0</td>\n      <td>26.0</td>\n      <td>7.0</td>\n      <td>0.700</td>\n      <td>76.0</td>\n      <td>2017-07-09 20:00:00</td>\n      <td>B</td>\n    </tr>\n    <tr>\n      <th>3064</th>\n      <td>17.0</td>\n      <td>48.0</td>\n      <td>24.0</td>\n      <td>6.0</td>\n      <td>0.700</td>\n      <td>74.0</td>\n      <td>2017-07-09 21:00:00</td>\n      <td>B</td>\n    </tr>\n    <tr>\n      <th>3065</th>\n      <td>17.0</td>\n      <td>42.0</td>\n      <td>20.0</td>\n      <td>6.0</td>\n      <td>0.600</td>\n      <td>67.0</td>\n      <td>2017-07-09 22:00:00</td>\n      <td>B</td>\n    </tr>\n    <tr>\n      <th>3066</th>\n      <td>17.0</td>\n      <td>47.0</td>\n      <td>23.0</td>\n      <td>6.0</td>\n      <td>0.700</td>\n      <td>70.0</td>\n      <td>2017-07-09 23:00:00</td>\n      <td>B</td>\n    </tr>\n  </tbody>\n</table>\n<p>6134 rows × 8 columns</p>\n</div>"},"metadata":{}}],"execution_count":10},{"cell_type":"markdown","metadata":{"id":"64E40EB94DE445B08929A019FC120102","notebookId":"66cecedfc175d1eb8493abb3","runtime":{"status":"default","execution_status":null,"is_visible":false},"jupyter":{},"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"}},"source":"### 生成训练数据"},{"cell_type":"code","metadata":{"id":"28FA6C3F257640E8805C9F62C69784B5","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 将气象数据和空气污染数据合并，产生训练数据集，根据time和station进行拼接，确保不会错位\ntrain = df_weather.merge(df_air, on=['time','station'])\ntrain","outputs":[{"output_type":"execute_result","data":{"text/plain":"     station                time  pressure     wd   ws   tem    rh  rain  \\\n0          A 2016-06-01 00:00:00    1002.2  120.0  2.4  22.3  97.0   0.6   \n1          A 2016-06-01 01:00:00    1002.3   37.0  0.7  22.4  98.0  10.5   \n2          A 2016-06-01 02:00:00    1002.4   56.0  1.7  22.1  97.0   0.4   \n3          A 2016-06-01 03:00:00    1002.0   78.0  2.4  21.1  97.0   0.0   \n4          A 2016-06-01 04:00:00    1002.0   66.0  1.8  20.5  97.0   0.0   \n...      ...                 ...       ...    ...  ...   ...   ...   ...   \n6129       B 2017-07-09 07:00:00    1005.7   15.0  0.6  29.7  73.0   0.0   \n6130       B 2017-07-09 20:00:00    1003.3  279.0  0.4  31.7  59.0   0.0   \n6131       B 2017-07-09 21:00:00    1003.7  282.0  0.7  31.3  62.0   0.0   \n6132       B 2017-07-09 22:00:00    1003.3  295.0  0.3  30.7  64.0   0.0   \n6133       B 2017-07-09 23:00:00    1002.7  331.0  0.5  30.4  72.0   0.0   \n\n      PM2.5  PM10   NO2   SO2     CO     O3  \n0      54.0   NaN  17.0  16.0  0.824   90.0  \n1      34.0   NaN  14.0  16.0  0.774   83.0  \n2      22.0   NaN  19.0  16.0  0.787   78.0  \n3      20.0   NaN  13.0  16.0  0.750  115.0  \n4       4.0   NaN   9.0  15.0  0.770  125.0  \n...     ...   ...   ...   ...    ...    ...  \n6129   18.0  42.0  31.0   8.0  0.700   46.0  \n6130   18.0  34.0  26.0   7.0  0.700   76.0  \n6131   17.0  48.0  24.0   6.0  0.700   74.0  \n6132   17.0  42.0  20.0   6.0  0.600   67.0  \n6133   17.0  47.0  23.0   6.0  0.700   70.0  \n\n[6134 rows x 14 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>station</th>\n      <th>time</th>\n      <th>pressure</th>\n      <th>wd</th>\n      <th>ws</th>\n      <th>tem</th>\n      <th>rh</th>\n      <th>rain</th>\n      <th>PM2.5</th>\n      <th>PM10</th>\n      <th>NO2</th>\n      <th>SO2</th>\n      <th>CO</th>\n      <th>O3</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>A</td>\n      <td>2016-06-01 00:00:00</td>\n      <td>1002.2</td>\n      <td>120.0</td>\n      <td>2.4</td>\n      <td>22.3</td>\n      <td>97.0</td>\n      <td>0.6</td>\n      <td>54.0</td>\n      <td>NaN</td>\n      <td>17.0</td>\n      <td>16.0</td>\n      <td>0.824</td>\n      <td>90.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>A</td>\n      <td>2016-06-01 01:00:00</td>\n      <td>1002.3</td>\n      <td>37.0</td>\n      <td>0.7</td>\n      <td>22.4</td>\n      <td>98.0</td>\n      <td>10.5</td>\n      <td>34.0</td>\n      <td>NaN</td>\n      <td>14.0</td>\n      <td>16.0</td>\n      <td>0.774</td>\n      <td>83.0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>A</td>\n      <td>2016-06-01 02:00:00</td>\n      <td>1002.4</td>\n      <td>56.0</td>\n      <td>1.7</td>\n      <td>22.1</td>\n      <td>97.0</td>\n      <td>0.4</td>\n      <td>22.0</td>\n      <td>NaN</td>\n      <td>19.0</td>\n      <td>16.0</td>\n      <td>0.787</td>\n      <td>78.0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>A</td>\n      <td>2016-06-01 03:00:00</td>\n      <td>1002.0</td>\n      <td>78.0</td>\n      <td>2.4</td>\n      <td>21.1</td>\n      <td>97.0</td>\n      <td>0.0</td>\n      <td>20.0</td>\n      <td>NaN</td>\n      <td>13.0</td>\n      <td>16.0</td>\n      <td>0.750</td>\n      <td>115.0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>A</td>\n      <td>2016-06-01 04:00:00</td>\n      <td>1002.0</td>\n      <td>66.0</td>\n      <td>1.8</td>\n      <td>20.5</td>\n      <td>97.0</td>\n      <td>0.0</td>\n      <td>4.0</td>\n      <td>NaN</td>\n      <td>9.0</td>\n      <td>15.0</td>\n      <td>0.770</td>\n      <td>125.0</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>6129</th>\n      <td>B</td>\n      <td>2017-07-09 07:00:00</td>\n      <td>1005.7</td>\n      <td>15.0</td>\n      <td>0.6</td>\n      <td>29.7</td>\n      <td>73.0</td>\n      <td>0.0</td>\n      <td>18.0</td>\n      <td>42.0</td>\n      <td>31.0</td>\n      <td>8.0</td>\n      <td>0.700</td>\n      <td>46.0</td>\n    </tr>\n    <tr>\n      <th>6130</th>\n      <td>B</td>\n      <td>2017-07-09 20:00:00</td>\n      <td>1003.3</td>\n      <td>279.0</td>\n      <td>0.4</td>\n      <td>31.7</td>\n      <td>59.0</td>\n      <td>0.0</td>\n      <td>18.0</td>\n      <td>34.0</td>\n      <td>26.0</td>\n      <td>7.0</td>\n      <td>0.700</td>\n      <td>76.0</td>\n    </tr>\n    <tr>\n      <th>6131</th>\n      <td>B</td>\n      <td>2017-07-09 21:00:00</td>\n      <td>1003.7</td>\n      <td>282.0</td>\n      <td>0.7</td>\n      <td>31.3</td>\n      <td>62.0</td>\n      <td>0.0</td>\n      <td>17.0</td>\n      <td>48.0</td>\n      <td>24.0</td>\n      <td>6.0</td>\n      <td>0.700</td>\n      <td>74.0</td>\n    </tr>\n    <tr>\n      <th>6132</th>\n      <td>B</td>\n      <td>2017-07-09 22:00:00</td>\n      <td>1003.3</td>\n      <td>295.0</td>\n      <td>0.3</td>\n      <td>30.7</td>\n      <td>64.0</td>\n      <td>0.0</td>\n      <td>17.0</td>\n      <td>42.0</td>\n      <td>20.0</td>\n      <td>6.0</td>\n      <td>0.600</td>\n      <td>67.0</td>\n    </tr>\n    <tr>\n      <th>6133</th>\n      <td>B</td>\n      <td>2017-07-09 23:00:00</td>\n      <td>1002.7</td>\n      <td>331.0</td>\n      <td>0.5</td>\n      <td>30.4</td>\n      <td>72.0</td>\n      <td>0.0</td>\n      <td>17.0</td>\n      <td>47.0</td>\n      <td>23.0</td>\n      <td>6.0</td>\n      <td>0.700</td>\n      <td>70.0</td>\n    </tr>\n  </tbody>\n</table>\n<p>6134 rows × 14 columns</p>\n</div>"},"metadata":{}}],"execution_count":13},{"cell_type":"code","metadata":{"id":"7E597BAE70D74E1FB1C655010C63CB39","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"train.to_csv('train_data.csv',index=False)","outputs":[],"execution_count":14},{"cell_type":"code","metadata":{"id":"0A8BEF3265F74CBD91844F57D66C5C9D","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 删除O3为缺测的行\ntrain = train.dropna(axis=0,subset=['O3'])\ntrain['O3'] = train['O3'].astype(int)\ntrain","outputs":[{"output_type":"execute_result","data":{"text/plain":"     station                time  pressure     wd   ws   tem    rh  rain  \\\n0          A 2016-06-01 00:00:00    1002.2  120.0  2.4  22.3  97.0   0.6   \n1          A 2016-06-01 01:00:00    1002.3   37.0  0.7  22.4  98.0  10.5   \n2          A 2016-06-01 02:00:00    1002.4   56.0  1.7  22.1  97.0   0.4   \n3          A 2016-06-01 03:00:00    1002.0   78.0  2.4  21.1  97.0   0.0   \n4          A 2016-06-01 04:00:00    1002.0   66.0  1.8  20.5  97.0   0.0   \n...      ...                 ...       ...    ...  ...   ...   ...   ...   \n6129       B 2017-07-09 07:00:00    1005.7   15.0  0.6  29.7  73.0   0.0   \n6130       B 2017-07-09 20:00:00    1003.3  279.0  0.4  31.7  59.0   0.0   \n6131       B 2017-07-09 21:00:00    1003.7  282.0  0.7  31.3  62.0   0.0   \n6132       B 2017-07-09 22:00:00    1003.3  295.0  0.3  30.7  64.0   0.0   \n6133       B 2017-07-09 23:00:00    1002.7  331.0  0.5  30.4  72.0   0.0   \n\n      PM2.5  PM10   NO2   SO2     CO   O3  \n0      54.0   NaN  17.0  16.0  0.824   90  \n1      34.0   NaN  14.0  16.0  0.774   83  \n2      22.0   NaN  19.0  16.0  0.787   78  \n3      20.0   NaN  13.0  16.0  0.750  115  \n4       4.0   NaN   9.0  15.0  0.770  125  \n...     ...   ...   ...   ...    ...  ...  \n6129   18.0  42.0  31.0   8.0  0.700   46  \n6130   18.0  34.0  26.0   7.0  0.700   76  \n6131   17.0  48.0  24.0   6.0  0.700   74  \n6132   17.0  42.0  20.0   6.0  0.600   67  \n6133   17.0  47.0  23.0   6.0  0.700   70  \n\n[6027 rows x 14 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>station</th>\n      <th>time</th>\n      <th>pressure</th>\n      <th>wd</th>\n      <th>ws</th>\n      <th>tem</th>\n      <th>rh</th>\n      <th>rain</th>\n      <th>PM2.5</th>\n      <th>PM10</th>\n      <th>NO2</th>\n      <th>SO2</th>\n      <th>CO</th>\n      <th>O3</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>A</td>\n      <td>2016-06-01 00:00:00</td>\n      <td>1002.2</td>\n      <td>120.0</td>\n      <td>2.4</td>\n      <td>22.3</td>\n      <td>97.0</td>\n      <td>0.6</td>\n      <td>54.0</td>\n      <td>NaN</td>\n      <td>17.0</td>\n      <td>16.0</td>\n      <td>0.824</td>\n      <td>90</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>A</td>\n      <td>2016-06-01 01:00:00</td>\n      <td>1002.3</td>\n      <td>37.0</td>\n      <td>0.7</td>\n      <td>22.4</td>\n      <td>98.0</td>\n      <td>10.5</td>\n      <td>34.0</td>\n      <td>NaN</td>\n      <td>14.0</td>\n      <td>16.0</td>\n      <td>0.774</td>\n      <td>83</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>A</td>\n      <td>2016-06-01 02:00:00</td>\n      <td>1002.4</td>\n      <td>56.0</td>\n      <td>1.7</td>\n      <td>22.1</td>\n      <td>97.0</td>\n      <td>0.4</td>\n      <td>22.0</td>\n      <td>NaN</td>\n      <td>19.0</td>\n      <td>16.0</td>\n      <td>0.787</td>\n      <td>78</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>A</td>\n      <td>2016-06-01 03:00:00</td>\n      <td>1002.0</td>\n      <td>78.0</td>\n      <td>2.4</td>\n      <td>21.1</td>\n      <td>97.0</td>\n      <td>0.0</td>\n      <td>20.0</td>\n      <td>NaN</td>\n      <td>13.0</td>\n      <td>16.0</td>\n      <td>0.750</td>\n      <td>115</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>A</td>\n      <td>2016-06-01 04:00:00</td>\n      <td>1002.0</td>\n      <td>66.0</td>\n      <td>1.8</td>\n      <td>20.5</td>\n      <td>97.0</td>\n      <td>0.0</td>\n      <td>4.0</td>\n      <td>NaN</td>\n      <td>9.0</td>\n      <td>15.0</td>\n      <td>0.770</td>\n      <td>125</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>6129</th>\n      <td>B</td>\n      <td>2017-07-09 07:00:00</td>\n      <td>1005.7</td>\n      <td>15.0</td>\n      <td>0.6</td>\n      <td>29.7</td>\n      <td>73.0</td>\n      <td>0.0</td>\n      <td>18.0</td>\n      <td>42.0</td>\n      <td>31.0</td>\n      <td>8.0</td>\n      <td>0.700</td>\n      <td>46</td>\n    </tr>\n    <tr>\n      <th>6130</th>\n      <td>B</td>\n      <td>2017-07-09 20:00:00</td>\n      <td>1003.3</td>\n      <td>279.0</td>\n      <td>0.4</td>\n      <td>31.7</td>\n      <td>59.0</td>\n      <td>0.0</td>\n      <td>18.0</td>\n      <td>34.0</td>\n      <td>26.0</td>\n      <td>7.0</td>\n      <td>0.700</td>\n      <td>76</td>\n    </tr>\n    <tr>\n      <th>6131</th>\n      <td>B</td>\n      <td>2017-07-09 21:00:00</td>\n      <td>1003.7</td>\n      <td>282.0</td>\n      <td>0.7</td>\n      <td>31.3</td>\n      <td>62.0</td>\n      <td>0.0</td>\n      <td>17.0</td>\n      <td>48.0</td>\n      <td>24.0</td>\n      <td>6.0</td>\n      <td>0.700</td>\n      <td>74</td>\n    </tr>\n    <tr>\n      <th>6132</th>\n      <td>B</td>\n      <td>2017-07-09 22:00:00</td>\n      <td>1003.3</td>\n      <td>295.0</td>\n      <td>0.3</td>\n      <td>30.7</td>\n      <td>64.0</td>\n      <td>0.0</td>\n      <td>17.0</td>\n      <td>42.0</td>\n      <td>20.0</td>\n      <td>6.0</td>\n      <td>0.600</td>\n      <td>67</td>\n    </tr>\n    <tr>\n      <th>6133</th>\n      <td>B</td>\n      <td>2017-07-09 23:00:00</td>\n      <td>1002.7</td>\n      <td>331.0</td>\n      <td>0.5</td>\n      <td>30.4</td>\n      <td>72.0</td>\n      <td>0.0</td>\n      <td>17.0</td>\n      <td>47.0</td>\n      <td>23.0</td>\n      <td>6.0</td>\n      <td>0.700</td>\n      <td>70</td>\n    </tr>\n  </tbody>\n</table>\n<p>6027 rows × 14 columns</p>\n</div>"},"metadata":{}}],"execution_count":15},{"cell_type":"code","metadata":{"id":"9FD3CFBDEA1A4BAAABF8C3D7A381EC0A","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 添加月特征和小时特征\ntrain['hour'] = train.time.dt.hour\ntrain['month'] = train.time.dt.month","outputs":[],"execution_count":16},{"cell_type":"markdown","metadata":{"id":"2B18C0E9E26A445FB256C935FC17AFAB","notebookId":"66cecedfc175d1eb8493abb3","runtime":{"status":"default","execution_status":null,"is_visible":false},"jupyter":{},"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"}},"source":"## 臭氧预测模型  \n### 模型训练"},{"cell_type":"code","metadata":{"id":"C050B6E12D5B4579ABD9E772306C8574","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"reg1 = setup(data=train,\n            target='O3',                                #指定目标列\n            categorical_features=['hour','month'],      #显式指定类别量的行\n            feature_selection=True,                     #特征筛选\n            feature_selection_threshold=0.8,            #去除次要特征\n            feature_interaction=True,                   #特征交互\n            normalize=True,                             #数据标准化\n            silent=True\n            )","outputs":[{"output_type":"display_data","data":{"text/plain":"<pandas.io.formats.style.Styler at 0x7fae8c1835c0>","text/html":"<style  type=\"text/css\" >\n#T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row3_col1,#T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row27_col1,#T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row42_col1,#T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row50_col1,#T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row53_col1{\n            background-color:  lightgreen;\n        }</style><table id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4\" ><thead>    <tr>        <th class=\"blank level0\" ></th>        <th class=\"col_heading level0 col0\" >Description</th>        <th class=\"col_heading level0 col1\" >Value</th>    </tr></thead><tbody>\n                <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row0\" class=\"row_heading level0 row0\" >0</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row0_col0\" class=\"data row0 col0\" >session_id</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row0_col1\" class=\"data row0 col1\" >4527</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row1\" class=\"row_heading level0 row1\" >1</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row1_col0\" class=\"data row1 col0\" >Target</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row1_col1\" class=\"data row1 col1\" >O3</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row2\" class=\"row_heading level0 row2\" >2</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row2_col0\" class=\"data row2 col0\" >Original Data</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row2_col1\" class=\"data row2 col1\" >(6027, 16)</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row3\" class=\"row_heading level0 row3\" >3</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row3_col0\" class=\"data row3 col0\" >Missing Values</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row3_col1\" class=\"data row3 col1\" >True</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row4\" class=\"row_heading level0 row4\" >4</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row4_col0\" class=\"data row4 col0\" >Numeric Features</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row4_col1\" class=\"data row4 col1\" >11</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row5\" class=\"row_heading level0 row5\" >5</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row5_col0\" class=\"data row5 col0\" >Categorical Features</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row5_col1\" class=\"data row5 col1\" >3</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row6\" class=\"row_heading level0 row6\" >6</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row6_col0\" class=\"data row6 col0\" >Ordinal Features</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row6_col1\" class=\"data row6 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row7\" class=\"row_heading level0 row7\" >7</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row7_col0\" class=\"data row7 col0\" >High Cardinality Features</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row7_col1\" class=\"data row7 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row8\" class=\"row_heading level0 row8\" >8</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row8_col0\" class=\"data row8 col0\" >High Cardinality Method</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row8_col1\" class=\"data row8 col1\" >None</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row9\" class=\"row_heading level0 row9\" >9</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row9_col0\" class=\"data row9 col0\" >Transformed Train Set</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row9_col1\" class=\"data row9 col1\" >(4218, 81)</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row10\" class=\"row_heading level0 row10\" >10</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row10_col0\" class=\"data row10 col0\" >Transformed Test Set</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row10_col1\" class=\"data row10 col1\" >(1809, 81)</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row11\" class=\"row_heading level0 row11\" >11</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row11_col0\" class=\"data row11 col0\" >Shuffle Train-Test</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row11_col1\" class=\"data row11 col1\" >True</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row12\" class=\"row_heading level0 row12\" >12</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row12_col0\" class=\"data row12 col0\" >Stratify Train-Test</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row12_col1\" class=\"data row12 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row13\" class=\"row_heading level0 row13\" >13</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row13_col0\" class=\"data row13 col0\" >Fold Generator</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row13_col1\" class=\"data row13 col1\" >KFold</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row14\" class=\"row_heading level0 row14\" >14</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row14_col0\" class=\"data row14 col0\" >Fold Number</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row14_col1\" class=\"data row14 col1\" >10</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row15\" class=\"row_heading level0 row15\" >15</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row15_col0\" class=\"data row15 col0\" >CPU Jobs</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row15_col1\" class=\"data row15 col1\" >-1</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row16\" class=\"row_heading level0 row16\" >16</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row16_col0\" class=\"data row16 col0\" >Use GPU</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row16_col1\" class=\"data row16 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row17\" class=\"row_heading level0 row17\" >17</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row17_col0\" class=\"data row17 col0\" >Log Experiment</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row17_col1\" class=\"data row17 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row18\" class=\"row_heading level0 row18\" >18</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row18_col0\" class=\"data row18 col0\" >Experiment Name</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row18_col1\" class=\"data row18 col1\" >reg-default-name</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row19\" class=\"row_heading level0 row19\" >19</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row19_col0\" class=\"data row19 col0\" >USI</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row19_col1\" class=\"data row19 col1\" >590a</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row20\" class=\"row_heading level0 row20\" >20</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row20_col0\" class=\"data row20 col0\" >Imputation Type</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row20_col1\" class=\"data row20 col1\" >simple</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row21\" class=\"row_heading level0 row21\" >21</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row21_col0\" class=\"data row21 col0\" >Iterative Imputation Iteration</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row21_col1\" class=\"data row21 col1\" >None</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row22\" class=\"row_heading level0 row22\" >22</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row22_col0\" class=\"data row22 col0\" >Numeric Imputer</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row22_col1\" class=\"data row22 col1\" >mean</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row23\" class=\"row_heading level0 row23\" >23</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row23_col0\" class=\"data row23 col0\" >Iterative Imputation Numeric Model</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row23_col1\" class=\"data row23 col1\" >None</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row24\" class=\"row_heading level0 row24\" >24</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row24_col0\" class=\"data row24 col0\" >Categorical Imputer</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row24_col1\" class=\"data row24 col1\" >constant</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row25\" class=\"row_heading level0 row25\" >25</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row25_col0\" class=\"data row25 col0\" >Iterative Imputation Categorical Model</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row25_col1\" class=\"data row25 col1\" >None</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row26\" class=\"row_heading level0 row26\" >26</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row26_col0\" class=\"data row26 col0\" >Unknown Categoricals Handling</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row26_col1\" class=\"data row26 col1\" >least_frequent</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row27\" class=\"row_heading level0 row27\" >27</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row27_col0\" class=\"data row27 col0\" >Normalize</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row27_col1\" class=\"data row27 col1\" >True</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row28\" class=\"row_heading level0 row28\" >28</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row28_col0\" class=\"data row28 col0\" >Normalize Method</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row28_col1\" class=\"data row28 col1\" >zscore</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row29\" class=\"row_heading level0 row29\" >29</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row29_col0\" class=\"data row29 col0\" >Transformation</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row29_col1\" class=\"data row29 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row30\" class=\"row_heading level0 row30\" >30</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row30_col0\" class=\"data row30 col0\" >Transformation Method</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row30_col1\" class=\"data row30 col1\" >None</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row31\" class=\"row_heading level0 row31\" >31</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row31_col0\" class=\"data row31 col0\" >PCA</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row31_col1\" class=\"data row31 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row32\" class=\"row_heading level0 row32\" >32</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row32_col0\" class=\"data row32 col0\" >PCA Method</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row32_col1\" class=\"data row32 col1\" >None</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row33\" class=\"row_heading level0 row33\" >33</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row33_col0\" class=\"data row33 col0\" >PCA Components</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row33_col1\" class=\"data row33 col1\" >None</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row34\" class=\"row_heading level0 row34\" >34</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row34_col0\" class=\"data row34 col0\" >Ignore Low Variance</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row34_col1\" class=\"data row34 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row35\" class=\"row_heading level0 row35\" >35</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row35_col0\" class=\"data row35 col0\" >Combine Rare Levels</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row35_col1\" class=\"data row35 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row36\" class=\"row_heading level0 row36\" >36</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row36_col0\" class=\"data row36 col0\" >Rare Level Threshold</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row36_col1\" class=\"data row36 col1\" >None</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row37\" class=\"row_heading level0 row37\" >37</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row37_col0\" class=\"data row37 col0\" >Numeric Binning</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row37_col1\" class=\"data row37 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row38\" class=\"row_heading level0 row38\" >38</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row38_col0\" class=\"data row38 col0\" >Remove Outliers</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row38_col1\" class=\"data row38 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row39\" class=\"row_heading level0 row39\" >39</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row39_col0\" class=\"data row39 col0\" >Outliers Threshold</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row39_col1\" class=\"data row39 col1\" >None</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row40\" class=\"row_heading level0 row40\" >40</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row40_col0\" class=\"data row40 col0\" >Remove Multicollinearity</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row40_col1\" class=\"data row40 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row41\" class=\"row_heading level0 row41\" >41</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row41_col0\" class=\"data row41 col0\" >Multicollinearity Threshold</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row41_col1\" class=\"data row41 col1\" >None</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row42\" class=\"row_heading level0 row42\" >42</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row42_col0\" class=\"data row42 col0\" >Remove Perfect Collinearity</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row42_col1\" class=\"data row42 col1\" >True</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row43\" class=\"row_heading level0 row43\" >43</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row43_col0\" class=\"data row43 col0\" >Clustering</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row43_col1\" class=\"data row43 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row44\" class=\"row_heading level0 row44\" >44</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row44_col0\" class=\"data row44 col0\" >Clustering Iteration</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row44_col1\" class=\"data row44 col1\" >None</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row45\" class=\"row_heading level0 row45\" >45</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row45_col0\" class=\"data row45 col0\" >Polynomial Features</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row45_col1\" class=\"data row45 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row46\" class=\"row_heading level0 row46\" >46</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row46_col0\" class=\"data row46 col0\" >Polynomial Degree</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row46_col1\" class=\"data row46 col1\" >None</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row47\" class=\"row_heading level0 row47\" >47</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row47_col0\" class=\"data row47 col0\" >Trignometry Features</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row47_col1\" class=\"data row47 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row48\" class=\"row_heading level0 row48\" >48</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row48_col0\" class=\"data row48 col0\" >Polynomial Threshold</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row48_col1\" class=\"data row48 col1\" >None</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row49\" class=\"row_heading level0 row49\" >49</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row49_col0\" class=\"data row49 col0\" >Group Features</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row49_col1\" class=\"data row49 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row50\" class=\"row_heading level0 row50\" >50</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row50_col0\" class=\"data row50 col0\" >Feature Selection</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row50_col1\" class=\"data row50 col1\" >True</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row51\" class=\"row_heading level0 row51\" >51</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row51_col0\" class=\"data row51 col0\" >Feature Selection Method</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row51_col1\" class=\"data row51 col1\" >classic</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row52\" class=\"row_heading level0 row52\" >52</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row52_col0\" class=\"data row52 col0\" >Features Selection Threshold</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row52_col1\" class=\"data row52 col1\" >0.800000</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row53\" class=\"row_heading level0 row53\" >53</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row53_col0\" class=\"data row53 col0\" >Feature Interaction</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row53_col1\" class=\"data row53 col1\" >True</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row54\" class=\"row_heading level0 row54\" >54</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row54_col0\" class=\"data row54 col0\" >Feature Ratio</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row54_col1\" class=\"data row54 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row55\" class=\"row_heading level0 row55\" >55</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row55_col0\" class=\"data row55 col0\" >Interaction Threshold</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row55_col1\" class=\"data row55 col1\" >0.010000</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row56\" class=\"row_heading level0 row56\" >56</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row56_col0\" class=\"data row56 col0\" >Transform Target</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row56_col1\" class=\"data row56 col1\" >False</td>\n            </tr>\n            <tr>\n                        <th id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4level0_row57\" class=\"row_heading level0 row57\" >57</th>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row57_col0\" class=\"data row57 col0\" >Transform Target Method</td>\n                        <td id=\"T_a6c39f3e_6511_11ef_a9bc_eea5abdb52f4row57_col1\" class=\"data row57 col1\" >box-cox</td>\n            </tr>\n    </tbody></table>"},"metadata":{}}],"execution_count":17},{"cell_type":"code","metadata":{"id":"F387A8812FF54B9598C403F3CA620179","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 选择lightgbm模型\nlightgbm = create_model('lightgbm')","outputs":[{"output_type":"display_data","data":{"text/plain":"<pandas.io.formats.style.Styler at 0x7fae9fc10dd8>","text/html":"<style  type=\"text/css\" >\n#T_af428512_6511_11ef_a9bc_eea5abdb52f4row10_col0,#T_af428512_6511_11ef_a9bc_eea5abdb52f4row10_col1,#T_af428512_6511_11ef_a9bc_eea5abdb52f4row10_col2,#T_af428512_6511_11ef_a9bc_eea5abdb52f4row10_col3,#T_af428512_6511_11ef_a9bc_eea5abdb52f4row10_col4,#T_af428512_6511_11ef_a9bc_eea5abdb52f4row10_col5{\n            background:  yellow;\n        }</style><table id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4\" ><thead>    <tr>        <th class=\"blank level0\" ></th>        <th class=\"col_heading level0 col0\" >MAE</th>        <th class=\"col_heading level0 col1\" >MSE</th>        <th class=\"col_heading level0 col2\" >RMSE</th>        <th class=\"col_heading level0 col3\" >R2</th>        <th class=\"col_heading level0 col4\" >RMSLE</th>        <th class=\"col_heading level0 col5\" >MAPE</th>    </tr>    <tr>        <th class=\"index_name level0\" >Fold</th>        <th class=\"blank\" ></th>        <th class=\"blank\" ></th>        <th class=\"blank\" ></th>        <th class=\"blank\" ></th>        <th class=\"blank\" ></th>        <th class=\"blank\" ></th>    </tr></thead><tbody>\n                <tr>\n                        <th id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4level0_row0\" class=\"row_heading level0 row0\" >0</th>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row0_col0\" class=\"data row0 col0\" >13.3411</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row0_col1\" class=\"data row0 col1\" >351.6029</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row0_col2\" class=\"data row0 col2\" >18.7511</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row0_col3\" class=\"data row0 col3\" >0.8775</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row0_col4\" class=\"data row0 col4\" >0.2728</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row0_col5\" class=\"data row0 col5\" >0.2286</td>\n            </tr>\n            <tr>\n                        <th id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4level0_row1\" class=\"row_heading level0 row1\" >1</th>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row1_col0\" class=\"data row1 col0\" >14.2654</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row1_col1\" class=\"data row1 col1\" >387.0971</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row1_col2\" class=\"data row1 col2\" >19.6748</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row1_col3\" class=\"data row1 col3\" >0.8613</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row1_col4\" class=\"data row1 col4\" >0.2822</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row1_col5\" class=\"data row1 col5\" >0.2183</td>\n            </tr>\n            <tr>\n                        <th id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4level0_row2\" class=\"row_heading level0 row2\" >2</th>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row2_col0\" class=\"data row2 col0\" >14.5526</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row2_col1\" class=\"data row2 col1\" >423.9887</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row2_col2\" class=\"data row2 col2\" >20.5910</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row2_col3\" class=\"data row2 col3\" >0.8183</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row2_col4\" class=\"data row2 col4\" >0.2823</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row2_col5\" class=\"data row2 col5\" >0.2322</td>\n            </tr>\n            <tr>\n                        <th id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4level0_row3\" class=\"row_heading level0 row3\" >3</th>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row3_col0\" class=\"data row3 col0\" >12.8306</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row3_col1\" class=\"data row3 col1\" >339.6728</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row3_col2\" class=\"data row3 col2\" >18.4302</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row3_col3\" class=\"data row3 col3\" >0.8634</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row3_col4\" class=\"data row3 col4\" >0.2729</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row3_col5\" class=\"data row3 col5\" >0.2202</td>\n            </tr>\n            <tr>\n                        <th id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4level0_row4\" class=\"row_heading level0 row4\" >4</th>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row4_col0\" class=\"data row4 col0\" >14.7178</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row4_col1\" class=\"data row4 col1\" >389.2159</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row4_col2\" class=\"data row4 col2\" >19.7286</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row4_col3\" class=\"data row4 col3\" >0.8694</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row4_col4\" class=\"data row4 col4\" >0.2828</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row4_col5\" class=\"data row4 col5\" >0.2374</td>\n            </tr>\n            <tr>\n                        <th id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4level0_row5\" class=\"row_heading level0 row5\" >5</th>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row5_col0\" class=\"data row5 col0\" >15.2525</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row5_col1\" class=\"data row5 col1\" >665.3487</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row5_col2\" class=\"data row5 col2\" >25.7944</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row5_col3\" class=\"data row5 col3\" >0.8225</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row5_col4\" class=\"data row5 col4\" >0.2754</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row5_col5\" class=\"data row5 col5\" >0.2221</td>\n            </tr>\n            <tr>\n                        <th id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4level0_row6\" class=\"row_heading level0 row6\" >6</th>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row6_col0\" class=\"data row6 col0\" >14.1889</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row6_col1\" class=\"data row6 col1\" >400.6694</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row6_col2\" class=\"data row6 col2\" >20.0167</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row6_col3\" class=\"data row6 col3\" >0.8446</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row6_col4\" class=\"data row6 col4\" >0.2921</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row6_col5\" class=\"data row6 col5\" >0.2668</td>\n            </tr>\n            <tr>\n                        <th id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4level0_row7\" class=\"row_heading level0 row7\" >7</th>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row7_col0\" class=\"data row7 col0\" >14.1749</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row7_col1\" class=\"data row7 col1\" >357.3390</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row7_col2\" class=\"data row7 col2\" >18.9034</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row7_col3\" class=\"data row7 col3\" >0.8631</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row7_col4\" class=\"data row7 col4\" >0.2633</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row7_col5\" class=\"data row7 col5\" >0.2173</td>\n            </tr>\n            <tr>\n                        <th id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4level0_row8\" class=\"row_heading level0 row8\" >8</th>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row8_col0\" class=\"data row8 col0\" >14.4572</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row8_col1\" class=\"data row8 col1\" >377.3934</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row8_col2\" class=\"data row8 col2\" >19.4266</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row8_col3\" class=\"data row8 col3\" >0.8488</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row8_col4\" class=\"data row8 col4\" >0.2970</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row8_col5\" class=\"data row8 col5\" >0.2504</td>\n            </tr>\n            <tr>\n                        <th id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4level0_row9\" class=\"row_heading level0 row9\" >9</th>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row9_col0\" class=\"data row9 col0\" >13.2234</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row9_col1\" class=\"data row9 col1\" >337.5156</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row9_col2\" class=\"data row9 col2\" >18.3716</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row9_col3\" class=\"data row9 col3\" >0.8521</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row9_col4\" class=\"data row9 col4\" >0.2527</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row9_col5\" class=\"data row9 col5\" >0.2034</td>\n            </tr>\n            <tr>\n                        <th id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4level0_row10\" class=\"row_heading level0 row10\" >Mean</th>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row10_col0\" class=\"data row10 col0\" >14.1004</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row10_col1\" class=\"data row10 col1\" >402.9843</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row10_col2\" class=\"data row10 col2\" >19.9688</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row10_col3\" class=\"data row10 col3\" >0.8521</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row10_col4\" class=\"data row10 col4\" >0.2774</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row10_col5\" class=\"data row10 col5\" >0.2297</td>\n            </tr>\n            <tr>\n                        <th id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4level0_row11\" class=\"row_heading level0 row11\" >Std</th>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row11_col0\" class=\"data row11 col0\" >0.7099</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row11_col1\" class=\"data row11 col1\" >91.3217</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row11_col2\" class=\"data row11 col2\" >2.0567</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row11_col3\" class=\"data row11 col3\" >0.0183</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row11_col4\" class=\"data row11 col4\" >0.0124</td>\n                        <td id=\"T_af428512_6511_11ef_a9bc_eea5abdb52f4row11_col5\" class=\"data row11 col5\" >0.0173</td>\n            </tr>\n    </tbody></table>"},"metadata":{}}],"execution_count":18},{"cell_type":"markdown","metadata":{"id":"9DE4472914B84C3780C457727C632196","notebookId":"66cecedfc175d1eb8493abb3","runtime":{"status":"default","execution_status":null,"is_visible":false},"jupyter":{},"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"}},"source":"### 模型预测"},{"cell_type":"code","metadata":{"id":"2A48DB34527448829A9D5ECF71D33BBB","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"def prepare_test_data(weather_file, air_file, station):\n    # 读取测试集的气象要素数据\n    test_weather = pd.read_csv(weather_file)\n    # 日期格式转换\n    test_weather['time'] = pd.to_datetime(test_weather['time'])\n    test_weather[test_weather==999017]=np.nan\n    test_weather[test_weather==999999.0]=np.nan\n    test_weather[test_weather==999990.0]=np.nan\n    # 读取站点数据\n    test_weather = test_weather[test_weather['station']==station]\n\n    # 读取测试集的空气污染数据\n    df = pd.read_csv(air_file)\n\n    # 提取测试集的PM2.5\n    df_PM25 = df[df['type']=='PM2.5'].loc[:,['date','hour','Station ' + station]]\n    df_PM25.rename(columns = {'Station ' + station:'PM2.5'},inplace=True)\n    df_PM25.reset_index(drop=True,inplace=True)\n\n    # 提取测试集的PM10\n    df_PM10 = df[df['type']=='PM10'].loc[:,['date','hour','Station ' + station]]\n    df_PM10.rename(columns = {'Station ' + station:'PM10'},inplace=True)\n    df_PM10.reset_index(drop=True,inplace=True)\n    \n    # 提取测试集的SO2\n    df_SO2 = df[df['type']=='SO2'].loc[:,['date','hour','Station ' + station]]\n    df_SO2.rename(columns = {'Station ' + station:'SO2'},inplace=True)\n    df_SO2.reset_index(drop=True,inplace=True)\n\n    # 提取测试集的NO2\n    df_NO2 = df[df['type']=='NO2'].loc[:,['date','hour','Station ' + station]]\n    df_NO2.rename(columns = {'Station ' + station:'NO2'},inplace=True)\n    df_NO2.reset_index(drop=True,inplace=True)\n\n    # 提取测试集的CO\n    df_CO = df[df['type']=='CO'].loc[:,['date','hour','Station ' + station]]\n    df_CO.rename(columns = {'Station ' + station:'CO'},inplace=True)\n    df_CO.reset_index(drop=True,inplace=True)\n\n    # 合并测试集的空气污染浓度数据\n    df_all = pd.merge(df_PM25,df_PM10, how='left', on=['date','hour'])\n    df_all = pd.merge(df_all,df_NO2, how='left', on=['date','hour'])\n    df_all = pd.merge(df_all,df_SO2, how='left', on=['date','hour'])\n    df_all = pd.merge(df_all,df_CO, how='left', on=['date','hour'])\n\n    # 生成time列，删除原先的date和hour列\n    df_all['date'] = df_all['date'].astype(str)\n    df_all['hour'] = df_all['hour'].astype(str)\n    df_all['time'] = pd.to_datetime(df_all['date'] +  df_all['hour'].str.zfill(2),format='%Y%m%d%H')\n    df_all.drop(columns=['date','hour'],inplace=True)\n\n    # 添加Station列\n    df_all['station'] = station\n\n    # 将气象要素和污染物浓度数据合并，产生测试数据集，根据time进行拼接，确保不会错位\n    test = test_weather.merge(df_all, on=['time','station'])\n\n    # 新增时间特征列\n    test['hour'] = test.time.dt.hour\n    test['month'] = test.time.dt.month\n\n    return test","outputs":[],"execution_count":19},{"cell_type":"code","metadata":{"id":"01BAB6BA008C426ABCC11305D5C2D8AD","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 站点A的测试集数据\ntest_A = prepare_test_data('/home/mw/input/ozone/test_weather.csv',\n                           '/home/mw/input/ozone/test_air.csv','A')\ntest_A","outputs":[{"output_type":"execute_result","data":{"text/plain":"    station                time  pressure     wd   ws   tem  rh  rain  PM2.5  \\\n0         A 2017-07-10 00:00:00    1002.1  178.0  1.4  29.7  78   0.0   22.0   \n1         A 2017-07-10 01:00:00    1001.6  191.0  1.3  29.4  81   0.0   22.0   \n2         A 2017-07-10 02:00:00    1000.8  171.0  1.8  29.1  82   0.0   23.0   \n3         A 2017-07-10 03:00:00    1000.7  175.0  1.4  29.0  81   0.0   29.0   \n4         A 2017-07-10 04:00:00    1000.9  195.0  2.1  29.2  79   0.0   22.0   \n..      ...                 ...       ...    ...  ...   ...  ..   ...    ...   \n115       A 2017-07-14 19:00:00    1006.3   66.0  0.5  29.3  82   0.0   30.0   \n116       A 2017-07-14 20:00:00    1006.4   54.0  0.4  28.5  89   0.0   41.0   \n117       A 2017-07-14 21:00:00    1007.2  131.0  0.5  29.2  83   0.0   43.0   \n118       A 2017-07-14 22:00:00    1006.6  146.0  0.4  29.8  82   0.0   42.0   \n119       A 2017-07-14 23:00:00    1006.9   68.0  0.5  28.5  89   0.0   32.0   \n\n      PM10    NO2   SO2   CO  hour  month  \n0     54.0   32.0   9.0  0.5     0      7  \n1     52.0   25.0   9.0  0.5     1      7  \n2     51.0   19.0   8.0  0.5     2      7  \n3     47.0   18.0   9.0  0.4     3      7  \n4     42.0   19.0   9.0  0.4     4      7  \n..     ...    ...   ...  ...   ...    ...  \n115   75.0   93.0  11.0  0.8    19      7  \n116   91.0  120.0  10.0  0.9    20      7  \n117  108.0  116.0   9.0  0.9    21      7  \n118   85.0   43.0   8.0  0.5    22      7  \n119   70.0   37.0   9.0  0.5    23      7  \n\n[120 rows x 15 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>station</th>\n      <th>time</th>\n      <th>pressure</th>\n      <th>wd</th>\n      <th>ws</th>\n      <th>tem</th>\n      <th>rh</th>\n      <th>rain</th>\n      <th>PM2.5</th>\n      <th>PM10</th>\n      <th>NO2</th>\n      <th>SO2</th>\n      <th>CO</th>\n      <th>hour</th>\n      <th>month</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>A</td>\n      <td>2017-07-10 00:00:00</td>\n      <td>1002.1</td>\n      <td>178.0</td>\n      <td>1.4</td>\n      <td>29.7</td>\n      <td>78</td>\n      <td>0.0</td>\n      <td>22.0</td>\n      <td>54.0</td>\n      <td>32.0</td>\n      <td>9.0</td>\n      <td>0.5</td>\n      <td>0</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>A</td>\n      <td>2017-07-10 01:00:00</td>\n      <td>1001.6</td>\n      <td>191.0</td>\n      <td>1.3</td>\n      <td>29.4</td>\n      <td>81</td>\n      <td>0.0</td>\n      <td>22.0</td>\n      <td>52.0</td>\n      <td>25.0</td>\n      <td>9.0</td>\n      <td>0.5</td>\n      <td>1</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>A</td>\n      <td>2017-07-10 02:00:00</td>\n      <td>1000.8</td>\n      <td>171.0</td>\n      <td>1.8</td>\n      <td>29.1</td>\n      <td>82</td>\n      <td>0.0</td>\n      <td>23.0</td>\n      <td>51.0</td>\n      <td>19.0</td>\n      <td>8.0</td>\n      <td>0.5</td>\n      <td>2</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>A</td>\n      <td>2017-07-10 03:00:00</td>\n      <td>1000.7</td>\n      <td>175.0</td>\n      <td>1.4</td>\n      <td>29.0</td>\n      <td>81</td>\n      <td>0.0</td>\n      <td>29.0</td>\n      <td>47.0</td>\n      <td>18.0</td>\n      <td>9.0</td>\n      <td>0.4</td>\n      <td>3</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>A</td>\n      <td>2017-07-10 04:00:00</td>\n      <td>1000.9</td>\n      <td>195.0</td>\n      <td>2.1</td>\n      <td>29.2</td>\n      <td>79</td>\n      <td>0.0</td>\n      <td>22.0</td>\n      <td>42.0</td>\n      <td>19.0</td>\n      <td>9.0</td>\n      <td>0.4</td>\n      <td>4</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>115</th>\n      <td>A</td>\n      <td>2017-07-14 19:00:00</td>\n      <td>1006.3</td>\n      <td>66.0</td>\n      <td>0.5</td>\n      <td>29.3</td>\n      <td>82</td>\n      <td>0.0</td>\n      <td>30.0</td>\n      <td>75.0</td>\n      <td>93.0</td>\n      <td>11.0</td>\n      <td>0.8</td>\n      <td>19</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>116</th>\n      <td>A</td>\n      <td>2017-07-14 20:00:00</td>\n      <td>1006.4</td>\n      <td>54.0</td>\n      <td>0.4</td>\n      <td>28.5</td>\n      <td>89</td>\n      <td>0.0</td>\n      <td>41.0</td>\n      <td>91.0</td>\n      <td>120.0</td>\n      <td>10.0</td>\n      <td>0.9</td>\n      <td>20</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>117</th>\n      <td>A</td>\n      <td>2017-07-14 21:00:00</td>\n      <td>1007.2</td>\n      <td>131.0</td>\n      <td>0.5</td>\n      <td>29.2</td>\n      <td>83</td>\n      <td>0.0</td>\n      <td>43.0</td>\n      <td>108.0</td>\n      <td>116.0</td>\n      <td>9.0</td>\n      <td>0.9</td>\n      <td>21</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>118</th>\n      <td>A</td>\n      <td>2017-07-14 22:00:00</td>\n      <td>1006.6</td>\n      <td>146.0</td>\n      <td>0.4</td>\n      <td>29.8</td>\n      <td>82</td>\n      <td>0.0</td>\n      <td>42.0</td>\n      <td>85.0</td>\n      <td>43.0</td>\n      <td>8.0</td>\n      <td>0.5</td>\n      <td>22</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>119</th>\n      <td>A</td>\n      <td>2017-07-14 23:00:00</td>\n      <td>1006.9</td>\n      <td>68.0</td>\n      <td>0.5</td>\n      <td>28.5</td>\n      <td>89</td>\n      <td>0.0</td>\n      <td>32.0</td>\n      <td>70.0</td>\n      <td>37.0</td>\n      <td>9.0</td>\n      <td>0.5</td>\n      <td>23</td>\n      <td>7</td>\n    </tr>\n  </tbody>\n</table>\n<p>120 rows × 15 columns</p>\n</div>"},"metadata":{}}],"execution_count":20},{"cell_type":"code","metadata":{"id":"E158DDB58A414410B9CAF2F01E645761","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 站点B的测试集数据\ntest_B = prepare_test_data('/home/mw/input/ozone/test_weather.csv',\n                           '/home/mw/input/ozone/test_air.csv','B')\ntest_B","outputs":[{"output_type":"execute_result","data":{"text/plain":"    station                time  pressure     wd   ws   tem  rh  rain  PM2.5  \\\n0         B 2017-07-10 00:00:00    1002.1  241.0  0.5  30.6  69   0.0   24.0   \n1         B 2017-07-10 01:00:00    1001.6  341.0  0.5  30.4  72   0.0   25.0   \n2         B 2017-07-10 02:00:00    1000.8  349.0  0.3  30.1  73   0.0   27.0   \n3         B 2017-07-10 03:00:00    1000.7  342.0  0.4  30.1  72   0.0   28.0   \n4         B 2017-07-10 04:00:00    1000.9  300.0  0.5  30.2  70   0.0   28.0   \n..      ...                 ...       ...    ...  ...   ...  ..   ...    ...   \n115       B 2017-07-14 19:00:00    1006.1    NaN  0.2  32.9  58   0.0    3.0   \n116       B 2017-07-14 20:00:00    1006.3   43.0  0.5  32.4  62   0.0   43.0   \n117       B 2017-07-14 21:00:00    1007.1  301.0  0.5  32.2  65   0.0   55.0   \n118       B 2017-07-14 22:00:00    1006.4   15.0  0.3  31.7  71   0.0   52.0   \n119       B 2017-07-14 23:00:00    1006.9  270.0  0.5  31.4  69   0.0   47.0   \n\n      PM10   NO2   SO2   CO  hour  month  \n0     57.0  21.0   5.0  0.7     0      7  \n1     46.0  17.0   5.0  0.7     1      7  \n2     47.0  15.0   6.0  0.6     2      7  \n3     38.0  13.0   6.0  0.6     3      7  \n4     33.0  12.0   6.0  0.6     4      7  \n..     ...   ...   ...  ...   ...    ...  \n115    7.0  55.0   5.0  1.0    19      7  \n116   88.0  68.0   6.0  1.0    20      7  \n117  102.0  47.0  11.0  0.9    21      7  \n118  138.0  33.0   6.0  0.8    22      7  \n119   91.0  37.0   5.0  0.9    23      7  \n\n[120 rows x 15 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>station</th>\n      <th>time</th>\n      <th>pressure</th>\n      <th>wd</th>\n      <th>ws</th>\n      <th>tem</th>\n      <th>rh</th>\n      <th>rain</th>\n      <th>PM2.5</th>\n      <th>PM10</th>\n      <th>NO2</th>\n      <th>SO2</th>\n      <th>CO</th>\n      <th>hour</th>\n      <th>month</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>B</td>\n      <td>2017-07-10 00:00:00</td>\n      <td>1002.1</td>\n      <td>241.0</td>\n      <td>0.5</td>\n      <td>30.6</td>\n      <td>69</td>\n      <td>0.0</td>\n      <td>24.0</td>\n      <td>57.0</td>\n      <td>21.0</td>\n      <td>5.0</td>\n      <td>0.7</td>\n      <td>0</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>B</td>\n      <td>2017-07-10 01:00:00</td>\n      <td>1001.6</td>\n      <td>341.0</td>\n      <td>0.5</td>\n      <td>30.4</td>\n      <td>72</td>\n      <td>0.0</td>\n      <td>25.0</td>\n      <td>46.0</td>\n      <td>17.0</td>\n      <td>5.0</td>\n      <td>0.7</td>\n      <td>1</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>B</td>\n      <td>2017-07-10 02:00:00</td>\n      <td>1000.8</td>\n      <td>349.0</td>\n      <td>0.3</td>\n      <td>30.1</td>\n      <td>73</td>\n      <td>0.0</td>\n      <td>27.0</td>\n      <td>47.0</td>\n      <td>15.0</td>\n      <td>6.0</td>\n      <td>0.6</td>\n      <td>2</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>B</td>\n      <td>2017-07-10 03:00:00</td>\n      <td>1000.7</td>\n      <td>342.0</td>\n      <td>0.4</td>\n      <td>30.1</td>\n      <td>72</td>\n      <td>0.0</td>\n      <td>28.0</td>\n      <td>38.0</td>\n      <td>13.0</td>\n      <td>6.0</td>\n      <td>0.6</td>\n      <td>3</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>B</td>\n      <td>2017-07-10 04:00:00</td>\n      <td>1000.9</td>\n      <td>300.0</td>\n      <td>0.5</td>\n      <td>30.2</td>\n      <td>70</td>\n      <td>0.0</td>\n      <td>28.0</td>\n      <td>33.0</td>\n      <td>12.0</td>\n      <td>6.0</td>\n      <td>0.6</td>\n      <td>4</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>115</th>\n      <td>B</td>\n      <td>2017-07-14 19:00:00</td>\n      <td>1006.1</td>\n      <td>NaN</td>\n      <td>0.2</td>\n      <td>32.9</td>\n      <td>58</td>\n      <td>0.0</td>\n      <td>3.0</td>\n      <td>7.0</td>\n      <td>55.0</td>\n      <td>5.0</td>\n      <td>1.0</td>\n      <td>19</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>116</th>\n      <td>B</td>\n      <td>2017-07-14 20:00:00</td>\n      <td>1006.3</td>\n      <td>43.0</td>\n      <td>0.5</td>\n      <td>32.4</td>\n      <td>62</td>\n      <td>0.0</td>\n      <td>43.0</td>\n      <td>88.0</td>\n      <td>68.0</td>\n      <td>6.0</td>\n      <td>1.0</td>\n      <td>20</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>117</th>\n      <td>B</td>\n      <td>2017-07-14 21:00:00</td>\n      <td>1007.1</td>\n      <td>301.0</td>\n      <td>0.5</td>\n      <td>32.2</td>\n      <td>65</td>\n      <td>0.0</td>\n      <td>55.0</td>\n      <td>102.0</td>\n      <td>47.0</td>\n      <td>11.0</td>\n      <td>0.9</td>\n      <td>21</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>118</th>\n      <td>B</td>\n      <td>2017-07-14 22:00:00</td>\n      <td>1006.4</td>\n      <td>15.0</td>\n      <td>0.3</td>\n      <td>31.7</td>\n      <td>71</td>\n      <td>0.0</td>\n      <td>52.0</td>\n      <td>138.0</td>\n      <td>33.0</td>\n      <td>6.0</td>\n      <td>0.8</td>\n      <td>22</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>119</th>\n      <td>B</td>\n      <td>2017-07-14 23:00:00</td>\n      <td>1006.9</td>\n      <td>270.0</td>\n      <td>0.5</td>\n      <td>31.4</td>\n      <td>69</td>\n      <td>0.0</td>\n      <td>47.0</td>\n      <td>91.0</td>\n      <td>37.0</td>\n      <td>5.0</td>\n      <td>0.9</td>\n      <td>23</td>\n      <td>7</td>\n    </tr>\n  </tbody>\n</table>\n<p>120 rows × 15 columns</p>\n</div>"},"metadata":{}}],"execution_count":21},{"cell_type":"code","metadata":{"id":"9BE0E033867443B084C87CAECB9388EC","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 站点A的臭氧污染预测\npredictions_A = predict_model(lightgbm, data=test_A)\npredictions_A['Label']","outputs":[{"output_type":"execute_result","data":{"text/plain":"0      66.524537\n1      69.162017\n2      75.579033\n3      79.405626\n4      72.350943\n         ...    \n115    30.306369\n116    34.504162\n117    53.702502\n118    95.520017\n119    54.897073\nName: Label, Length: 120, dtype: float64"},"metadata":{}}],"execution_count":22},{"cell_type":"code","metadata":{"id":"9372B1224EC44BD38F7A83477B214707","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 站点B的臭氧污染预测\npredictions_B = predict_model(lightgbm, data=test_B)\npredictions_B['Label']","outputs":[{"output_type":"execute_result","data":{"text/plain":"0       95.691817\n1       84.384109\n2       90.199761\n3       83.602762\n4      100.981559\n          ...    \n115     83.332531\n116    112.218699\n117    141.707142\n118    148.158535\n119    133.121250\nName: Label, Length: 120, dtype: float64"},"metadata":{}}],"execution_count":23},{"cell_type":"code","metadata":{"id":"59570E7A79A6470F9582D19DC884E633","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 读取提交示例文件\ndf = pd.read_csv('/home/mw/input/ozone/answer_sample.csv')\ndf","outputs":[{"output_type":"execute_result","data":{"text/plain":"      id  O3\n0      1 NaN\n1      2 NaN\n2      3 NaN\n3      4 NaN\n4      5 NaN\n..   ...  ..\n235  236 NaN\n236  237 NaN\n237  238 NaN\n238  239 NaN\n239  240 NaN\n\n[240 rows x 2 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>O3</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>3</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>4</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>5</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>235</th>\n      <td>236</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>236</th>\n      <td>237</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>237</th>\n      <td>238</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>238</th>\n      <td>239</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>239</th>\n      <td>240</td>\n      <td>NaN</td>\n    </tr>\n  </tbody>\n</table>\n<p>240 rows × 2 columns</p>\n</div>"},"metadata":{}}],"execution_count":24},{"cell_type":"code","metadata":{"id":"03BFF10024EB45ACA63FCF51CBE76CED","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 将预测结果进行提交并填充\n# ！！！前120行填入站点A的臭氧预测浓度！！！\ndf.iloc[0:120]['O3'] = np.array(predictions_A['Label'])\n# ！！！后120行填入站点B的臭氧预测浓度！！！\ndf.iloc[120:240]['O3'] = np.array(predictions_B['Label'])\ndf","outputs":[{"output_type":"execute_result","data":{"text/plain":"      id          O3\n0      1   66.524537\n1      2   69.162017\n2      3   75.579033\n3      4   79.405626\n4      5   72.350943\n..   ...         ...\n235  236   83.332531\n236  237  112.218699\n237  238  141.707142\n238  239  148.158535\n239  240  133.121250\n\n[240 rows x 2 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>O3</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1</td>\n      <td>66.524537</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2</td>\n      <td>69.162017</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>3</td>\n      <td>75.579033</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>4</td>\n      <td>79.405626</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>5</td>\n      <td>72.350943</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>235</th>\n      <td>236</td>\n      <td>83.332531</td>\n    </tr>\n    <tr>\n      <th>236</th>\n      <td>237</td>\n      <td>112.218699</td>\n    </tr>\n    <tr>\n      <th>237</th>\n      <td>238</td>\n      <td>141.707142</td>\n    </tr>\n    <tr>\n      <th>238</th>\n      <td>239</td>\n      <td>148.158535</td>\n    </tr>\n    <tr>\n      <th>239</th>\n      <td>240</td>\n      <td>133.121250</td>\n    </tr>\n  </tbody>\n</table>\n<p>240 rows × 2 columns</p>\n</div>"},"metadata":{}}],"execution_count":25},{"cell_type":"code","metadata":{"id":"693ABCBE0D994D2D9E8CC455BF26477B","notebookId":"66cecedfc175d1eb8493abb3","jupyter":{},"collapsed":false,"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"},"trusted":true},"source":"# 保存提交文件\ndf.to_csv('submit.csv',index=False)","outputs":[],"execution_count":26},{"cell_type":"markdown","metadata":{"id":"8BE44FCB4E1B4CE2AACA1EF62161B972","notebookId":"66cecedfc175d1eb8493abb3","runtime":{"status":"default","execution_status":null,"is_visible":false},"jupyter":{},"scrolled":false,"tags":[],"slideshow":{"slide_type":"slide"}},"source":"## 探索  \n1、单个模型可能存在**较大不确定性**，可以尝试使用**多模型融合**的方式，进一步提高精度。  \n\n2、站点A和站点B**不是相互独立的**，而是地理空间上距离较近的站点，如果同时考虑站点A和站点B的气象要素和污染物数据，**协同预测**可能具有更好的效果。  \n"}],"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"name":"python","mimetype":"text/x-python","nbconvert_exporter":"python","file_extension":".py","version":"3.5.2","pygments_lexer":"ipython3"}},"nbformat":4,"nbformat_minor":0}